Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views23 pages

Mpeg Coding Principles

This chapter introduces the fundamental principles of MPEG coding, focusing on the structure of coding systems, including modeling and entropy coding. It explains the importance of lossy data reduction and the role of entropy coding in minimizing bitstream length through techniques like Huffman coding. Additionally, it discusses statistical modeling and the challenges of symbol probabilities in the context of efficient data compression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views23 pages

Mpeg Coding Principles

This chapter introduces the fundamental principles of MPEG coding, focusing on the structure of coding systems, including modeling and entropy coding. It explains the importance of lossy data reduction and the role of entropy coding in minimizing bitstream length through techniques like Huffman coding. Additionally, it discusses statistical modeling and the challenges of symbol probabilities in the context of efficient data compression.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

5

MPEG Coding Principles

In this chapter we review some of the basic principles used in MPEG coding.
We use this chapter to introduce some of the terminology used through-
out the book, but our treatment here is necessarily brief. Readers who
have no familiarity with data compression will probably want to supple-
ment the treatment here with texts such as [RJ91], [BK95], [Say96], [Hof97]
and [PM93].
We first present a very high-level view of a coding system, showing how
the system can be divided into modeling and entropy coding sections. Then,
we discuss the very important topics of entropy coding and coding models,
in that order. Finally, we discuss the specifics of the MPEG-1 coding tech-
niques and present block diagrams for MPEG-1 encoders and decoders. Note
that our use of the term system in this chapter is unrelated to the MPEG
system layer.

5.1 Coding system structure


The high-level coding system diagram sketched in Figure 5.1 illustrates the
structure of a typical encoder and decoder system. The analog to digital
conversion (A/D) determines the basic resolution and precision of the input
data, and thus is a very important step in reducing the almost unlimited data
that potentially is available from the original scene to a manageable level.
However, data reduction does not necessarily stop once the digitization is
completed.
Compression systems that do no further data reduction once the picture
is digitized are lossless systems; these lossless compression systems rarely
compress natural image data by more than a factor of 2 to 3. Compression
systems such as MPEG need to achieve considerably more than an order
of magnitude higher compression than this, and they do this by means of

81
82 MPEG VIDEO COMPRESSION STANDARD

further lossy data reduction after the digitization.


As shown in Figure 5.1, it is convenient to separate a coding system into
two parts. The first part is the encoder model that performs lossy data
reduction in the process of changing the digital source data into a more
abstract representation which is conventionally labeled symbols. The second
part then codes these symbols in a process that minimizes the bitstream
length in a statistical sense. This second step is called entropy coding.1

5.1.1 Isolating the model


The decoder in Figure 5.1 reverses the encoding process, at least to the
extent possible. It first losslessly converts the compressed data back to
symbols, and then rebuilds a digital picture that is (hopefully) a visually
close approximation to the original digital source data. This digital data is
fed through a D/A (digital to analog) converter to recreate an analog output
signal for the display.
For the moment consider the entropy encoder and decoder to be “black
boxes” with the following properties: The entropy encoder accepts a stream
of symbols from the encoder model and converts it to compressed data;
the entropy decoder decodes that compressed data and returns an identical
1
In [PM93] a slightly different decomposition is used in which an intermediate stage of
descriptors is defined. This was necessary because of the two entropy-coding techniques
used in JPEG. MPEG uses only one type of entropy coding and this intermediate stage
is not needed. We therefore use the more conventional decomposition in this book.
CHAPTER 5. MPEG CODING PRINCIPLES 83

stream of symbols to the decoder model. Since by definition the entropy


encoder and decoder are lossless, the compression system can be truncated
as shown in Figure 5.2. Although the system may not provide much com-
pression, it is still completely functional in other respects.

5.1.2 Symbols and statistical modeling


Symbols are an abstract representation of the data that usually does not
permit an exact reversible reconstruction of the original source data (unless
specifically designed to be reversible). In general, the coding model involves
both data reduction and a recasting of the data into symbols. The symbols
chosen to represent the data are usually a less correlated representation that
allows an efficient and relatively simple entropy-coding step. The process of
choosing a set of symbols to represent the data involves a statistical model
of the data — so named, because it is based on a careful analysis of the
statistical properties of the data.
Statistical modeling pervades the design of the coding model. For ex-
ample, one coding model used extensively for lossless coding is DPCM (dif-
ferential pulse code modulation). In this coding technique a difference is
calculated between each pel and a prediction calculated from neighboring
pel values already transmitted. Pel differences turn out to be considerably
less correlated than the original pel values, and can be coded independently
with reasonable efficiency. Better coding efficiency is achieved, however,
when correlations between differences are taken into account in the statisti-
cal modeling.
An even more effective way of reducing correlation is with the DCT, as
has been mentioned in earlier chapters. However, before we get into detailed
84 MPEG VIDEO COMPRESSION STANDARD

discussion of DCT based models, we need to develop the topic of entropy


coding.

5.2 Entropy coding


An entropy encoder losslessly converts a sequence of symbols into com
pressed data, and the entropy decoder in that figure reverses the process
to produce the identical sequence at the decoder. The task of the entropy
coder is to encode the sequence of symbols to the shortest possible bitstream.

5.2.1 Principles of entropy coding

In general, the symbols fed to the entropy encoder are members of an alpha-
bet, a term that requires some explanation. Suppose the task is to compress
a string of text consisting of a particular sequence of lowercase letters. If
all letters can occur in the sequence, the compression alphabet would then
consist of the 26 letters of the normal alphabet, a through z. However, if
the text might include both uppercase and lowercase letters, the compres-
sion alphabet must have 52 entries (symbols). If spaces are allowed in the
text, the space symbol must be added to the alphabet. If any punctuation
marks can occur, the compression alphabet must include them as well. If
all of the characters that can be expressed by an 8-bit code are allowed, the
alphabet must consist of 256 symbols; if two-byte codes are allowed, the al-
phabet should have 65,536 symbols. In other words, the alphabet is the set
of possible symbols, whereas the message — the actual stream of symbols
fed to the entropy encoder — is one of many possible streams. It is the task
of the entropy encoder to transmit this particular stream of symbols in as
few bits as possible, and this requires statistically optimum code lengths for
each symbol. The technique for making optimal integer code assignments is
called Huffman coding.
Entropy encoders are somewhat like intelligent gamblers. The currency
is denominated in bits and the objective is to spend as few bits as possible
in getting the symbol stream (the message) to the decoder. As with any
gambling scheme, symbol probabilities play a major role in how the bets
are placed. To win, the more probable symbols must be coded with fewer
bits than average. However, the only way this is possible is if the less
probable symbols are coded with more bits than average. Therein lies the
gamble. Usually, this intelligent gambler wins, but if a message containing
only symbols with very low probabilities occurs, the message is expanded
rather than compressed. The bet is then lost.
CHAPTER 5. MPEG CODING PRINCIPLES 85

5.2.2 Mathematics of entropy coding


According to a fundamental theorem of information theory, the optimum
code length for a symbol, (and also the number of bits of information
transmitted when that symbol is sent) is given by [Sha49]:

The entropy, which is simply the average number of bits per symbol, is
given by:

where the sum is over all symbols in the alphabet. The entropy is a lower
bound on the average code length in a compressed message.
The following is not a proof of this theorem but may help in understand-
ing it: Suppose the alphabet consists of four possible symbols, a, b, c, and
d. Suppose further that symbol counts, and are kept of each
symbol for a long message and each symbol has been encountered at least
once. Now, starting with symbol a, stack the symbol counts in succession,
as shown in Figure 5.3.
Each symbol is associated with a particular count interval and thus can
be specified by selecting any number within that interval. Symbol a, for
example, is associated with the interval from 0 to and can be selected
by any number between 0 and 3. Symbol b is associated with the interval
from to and therefore can be selected by the numbers 4 and
5. Symbol d is associated with the interval from to the interval
, and can be selected by any number between 8 and
15. Note that there is no overlap in these intervals. The first two columns
of Table 5.1 summarize this.
86 MPEG VIDEO COMPRESSION STANDARD

The third column of Table 5.1 contains the binary representation of the
counts, and a very interesting thing can be seen there. For the largest count
interval, , only the most significant bit is unique and the lower-order bits
can be either zero or one. For the next smaller interval, , the two most
significant bits must be zero, but the rest can have either sense. For the two
smallest intervals three bits are required to uniquely specify the interval.
When the bits that are not unique are discarded, the code words in the 4th
column are created. Note that these codes have the required unique prefix
property — that is, no short code is a prefix (the leading bits) of a longer
code.
If symbol probabilities are calculated from the symbol counts, ideal code
lengths can then be calculated from Equation 5.2, as shown in columns 5
and 6 of Table 5.1. These ideal code lengths exactly match the lengths of
the code words obtained from the unique leading bits.2

5.2.3 Huffman coding


Code lengths are readily determined when all probabilities are simple powers
of 2, but what does one do when a probability is, for example, 1/37? The
code assignment procedure worked out by Huffman [Huf52] solves this prob-
lem neatly, producing a compact code — one that is optimum, with unique
prefixes, and that uses all of the codes of each length in the table. Huffman
coding provides integer length codes of optimum length, and only noninteger
length coding techniques such as arithmetic coding [Ris76, CWN87, PM93]
can improve upon the coding efficiency of Huffman coding.

5.2.4 Assigning Huffman codes


The procedure for assigning Huffman codes is based on pairing of symbols
and groups of symbols. Given the set of symbols, a, b, c, and d, illustrated in
2
This way of partitioning the number line is, in fact, somewhat related to a proof found
in [BCW90] for the theorem in Equation 5.2.
CHAPTER 5. MPEG CODING PRINCIPLES 87

Figure 5.3, the code tree in Figure 5.4 can be constructed. In general, a tree
such as this is developed by pairing the two symbols or groups of symbols
of lowest probability and combining them to form a branch of the tree. The
probability of being on that branch is simply the sum of the probabilities
of the two symbols (or groups) paired. The tree is further developed until
only a single branch — the root of the tree — remains. The two branches of
each pair are then assigned a 1 and 0 code bit (the assignment is arbitrary).
Code words for a particular symbol are obtained by concatenating the code
bits, starting at the root and working back to the leaf corresponding to that
symbol. The entire set of code words generated in this manner is called a
Huffman code table. Although the example given here is for probabilities
that are exact powers of 2, this table generation technique applies generally.

5.2.5 Adaptive entropy coding


MPEG is a nonadaptive coding system, and thus uses fixed code tables.
To set up the MPEG code tables, many image sequences were coded and
statistics were gathered for each code word in the table. Although ideally the
code word lengths should be exactly matched to the statistics, nonadaptive
coding is relatively robust when there are minor deviations from the ideal. If
a code word happens to be too long for one symbol, the code word assigned
to another symbol will be too short. Therefore, when the probabilities for
symbols are not quite matched to the assumed values, the increased number
of bits for some symbols is at least partly offset by a decreased number of bits
for others. If deviations are not minor, however, coding efficiency degrades,
and adaptive coding can then provide significantly better compression.
In adaptive entropy coding the code tables are modified to better match
the symbol probabilities. In the JPEG still image compression standard, for
example, Huffman tables matched to the statistics for a particular picture
can be transmitted as part of the compressed picture. JPEG also provides
88 MPEG VIDEO COMPRESSION STANDARD

an alternative form of entropy coding, adaptive arithmetic coding, that can


even adapt to changing statistics within an image.
There is, of course, a price to be paid for the coding efficiency gains with
adaptive coding, in that the system is usually more complex. Only when
substantial mismatch between assumed and actual probabilities can occur
is there a good justification for adaptive entropy coding.

5.2.6 Symbol probabilities


In Table 5.1 symbol probabilities were calculated from the symbol counts.
In doing this the assumption is made that the counts are averages over a
large number of “typical” messages. If the number of messages is large
enough, all of the symbols will occur sufficiently often that reliable values
for the probabilities can be calculated. It sometimes happens, however, that
a symbol is theoretically possible, but so rare that it never occurs in any of
the messages used to train the code table. The code table must be able to
represent any possible message, and this creates a problem known as the zero
frequency problem — if the frequency of occurrence is zero, the probability
calculated from the ratio of the symbol count to the total count is zero and
the ideal code length is therefore infinite. One commonly used solution to
the zero frequency problem is to start all symbol counts at one rather than
zero.3

5.3 Statistical models


The statistical model is really distributed throughout the encoder model.
The act of decorrelating the data by taking pel differences or by computing
a DCT is in some sense part of the statistical model. However, there is
another very important task for the statistical model once this decorrelated
representation is obtained, and this is the creation of an efficient symbol
set.
It is certainly possible to code DCT coefficients or pel differences di-
rectly, calling these the symbols. However, whenever symbol probabilities
get too large, Huffman coding usually becomes inefficient because of the re-
striction to integer code words. The shortest code length possible is 1 bit,
3
This initialization of counts is one example of a technique for estimating probabilities
called Bayesian estimation. When each count is set to the same initial value, the as-
sumption is, in the absence of any information about symbol counts, that all symbols are
equiprobable. If some counts are set to larger numbers than others, some prior knowledge
is assumed. The initial values used for the counts reflect the degree of confidence in the
prior knowledge about probabilities, and fractional values less than 1 imply a great deal
of uncertainty about the initial probabilities. Further information on Bayesian estimation
may be found in [PM93] and in [Wil91].
CHAPTER 5. MPEG CODING PRINCIPLES 89

corresponding to a symbol probability of 0.5. As the probability rises above


0.5, the code length remains 1, even though ideally it should be shorter.
The solution to this problem is to combine symbols based on DCT coef-
ficients or pel differences, creating a new alphabet in which two or more of
the old symbols are represented by a single new symbol. This is a key part
of the statistical model.
When combining symbols, a separate symbol is needed for each possible
combination of the old symbols. The probability of each symbol is given
by the product of the probabilities of the old symbols combined in it. For
example, the DCT used in MPEG typically has many zero coefficients, es-
pecially at higher spatial frequencies. The probability of a zero value is
therefore well above 0.5, and coding the coefficients individually is quite in-
efficient. Instead, the zero coefficient values are combined in various ways
to create new symbols with lower probabilities. Since each possible combi-
nation must be given a symbol, combining symbols expands the size of the
alphabet. However, the coding efficiency is much improved.
Statistical models can also exploit conditional probabilities. Conditional
probabilities are probabilities that depend on a particular history of prior
events. For example, the probability of the letter u occurring in text is
about 0.02 for typical English text documents [WMB94]. However, if the
letter q occurs, u becomes much more probable — about 0.95, in fact. When
a probability is a function of past symbols coded we call it a conditional
probability. When code tables are based on conditional probabilities, the
coding is then conditioned on prior symbols.
Not surprisingly, many compression systems use conditional probabili-
ties. MPEG, for example, changes the code tables, depending on the type of
picture being coded. More details of the statistical models used in MPEG
will be found in the following sections on coding models.

5.4 Coding models

As anyone who has ever examined a movie film strip knows, the individual
pictures in a movie are very similar from one picture to the next. In addition,
the two-dimensional array of samples that results from the digitizing of
each picture is also typically very highly correlated — that is, adjacent
samples within a picture are very likely to have a similar intensity. It is this
correlation, both within each picture and from picture to picture that makes
it possible to compress these sequences very effectively.
90 MPEG VIDEO COMPRESSION STANDARD

5.4.1 I-, P-, and B-pictures


MPEG divides the pictures in a sequence into three basic categories as il-
lustrated in Figure 2.4 in Chapter 2. Intra-coded pictures or I-pictures are
coded without reference to preceding or upcoming pictures in the sequence.
Predicted pictures or P-pictures are coded with respect to the temporally
closest preceding I-picture or P-picture in the sequence. Bidirectionally
coded pictures or B-pictures are interspersed between the I-pictures and
P-pictures in the sequence, and are coded with respect to the immediately
adjacent I- and P-pictures either preceding, upcoming, or both. Even though
several B-pictures may occur in immediate succession, B-pictures may never
be used to predict another picture.

5.4.2 MPEG coding models


Since I-pictures are coded without reference to neighboring pictures in the
sequence, the coding model must exploit only the correlations within the
picture. For I-pictures, the coding models used by MPEG are similar to
those defined by JPEG, and the reader may therefore find the extensive
discussion of coding models in [PM93] helpful.
P-pictures and B-pictures are coded as differences, the difference being
between the picture being coded and a reference picture. Where the image
has not changed from one picture to the next, the difference will be zero.
Only the areas that have changed need to be updated, a process known
as conditional replenishment. If there is motion in the sequence, a better
prediction can be obtained from pels in the reference picture that are shifted
relative to the current picture pels. Motion compensation is a very important
part of the coding models for P- and B-pictures and will be discussed in detail
in Chapter 11.
Given an I-picture or the motion-compensated difference array for a P-
or B-picture, two basic coding models are used to complete the coding pro-
cess: a discrete cosine transform model (DCT-based model) and a predictive
model. The DCT-based model is common to the coding of all of the picture
types and as discussed earlier, plays a central role in MPEG. The quantiza-
tion of the DCT coefficients permits the MPEG coding system to take good
advantage of the spatial frequency dependency of the human eye’s response
to luminance and chrominance (see Chapter 4). Furthermore, DCT coeffi-
cients are almost perfectly decorrelated, as shown in the pioneering work by
Ahmed, Natarajan, and Rao [ANR74]. This means that the coding models
do not need to consider conditional statistical properties of coefficients.
The coding of quantized DCT coefficients is lossless — that is, the de-
coder is able to reproduce the exact same quantized values. The coding
models are dependent, however, on the type of picture being coded.
CHAPTER 5. MPEG CODING PRINCIPLES 91

5.4.3 Quantization in MPEG-1


As discussed in Chapters 3 and 4, the DCT decomposes the picture data
into underlying spatial frequencies. Since the response of the human visual
system is a function of spatial frequency, the precision with which each
coefficient is represented should also be a function of spatial frequency. This
is the role of quantization.
The quantization table is an array of 64 quantization values, one for
each DCT coefficient. These are used to divide the unquantized DCT coef-
ficients to reduce the precision, following rules described in Section 3.6.1 in
Chapter 3.
There are two quantization tables in MPEG-1, one for intra coding and
one for nonintra coding. The default intra coding table, shown in Table 5.2,
has a distribution of quantizing values that is roughly in accord with the
frequency response of the human eye, given a viewing distance of approxi-
mately six times the screen width and a 360x240 pel picture.
The nonintra default MPEG quantization table is flat with a fixed value
of 16 for all coefficients, including the DC term. The constant quantization
value for all coefficients is of interest, for it appears to be at odds with the
notion that higher spatial frequencies should be less visible. However, this
table is used for coding of differential changes from one picture to the next,
and these changes, by definition, involve temporal masking effects.

5.4.4 Coding of I-pictures


In I-pictures the DCT coefficients within a given block are almost completely
decorrelated. However there is still some correlation between the coefficients
in a given block and the coefficients of neighboring blocks. This is especially
true for the block averages represented by the DC coefficients. For this
reason, the DC coefficient is coded separately from the AC by a predictive
92 MPEG VIDEO COMPRESSION STANDARD

DPCM technique.

5.4.4.1 Coding DC coefficients in I-pictures


As shown in equation 5.3, the DC value of the neighboring block just coded
(from the same component), P, is the prediction for the DC value in the
current block. The difference, , is usually close to zero.

Note that the block providing the prediction is determined by the coding
order of the blocks in the macroblock. Figure 5.5 provides a sketch of the
coding and prediction sequence.
The coding of is done by coding a size category and additional
bits that specify the precise magnitude and sign. Size categories and corre-
sponding code words for luma and chroma are given in Table 5.3.
The size category determines the number of additional bits required to
fully specify the DC difference. If size in Table 5.3 is zero, the DC difference
is zero; otherwise, size bits are appended to bitstream following the code for
size. These bits supply the additional information to fully specify the sign
and magnitude.
The coding of the difference values is done exactly as in JPEG. To code a
given difference, the size category is first determined and coded. If the differ-
CHAPTER 5. MPEG CODING PRINCIPLES 93
94 MPEG VIDEO COMPRESSION STANDARD

ence is negative, 1 is subtracted. Then, size low-order bits of the difference


are appended to the bitstream. Table 5.4 provides a few examples.

5.4.4.2 Coding AC coefficients in I-pictures


The decorrelation provided by the DCT permits the AC coefficients to be
coded independently of one another, and this greatly simplifies the coding
process. However, although the coding of a given coefficient is independent
of other coefficients, the position of the coefficient in the array is impor-
tant. Coefficients representing high spatial frequencies are almost always
zero, whereas low-frequency coefficients are often nonzero. To exploit this
behavior, the coefficients are arranged qualitatively from low to high spatial
frequency following the zigzag scan order shown in Figure 5.6. This zigzag
scan approximately orders the coefficients according to their probability of
being zero.
Even with zigzag ordering, many coefficients are zero in a typical 8x8
block; therefore, it is impossible to code the block efficiently without com-
bining zero coefficients into composite symbols.
A very important example of a composite symbol is the end-of-block
(EOB). The EOB symbol codes all of the trailing zero coefficients in the
zigzag-ordered DCT with a single code word. Typically, the EOB occurs
well before the midpoint of the array, and in fact, is so probable that it
is assigned a two-bit code. Note that if each of the coefficients was coded
independently, about 60 bits would be needed to code the same information.
EOB coding is illustrated in Figure 5.7. Coding starts with the lowest
frequency coefficient, and continues until no more nonzero coefficients re-
main in the zigzag sequence. The coding of the block is then terminated
CHAPTER 5. MPEG CODING PRINCIPLES 95

with the end-of-block code. The end-of-block code is short, only two bits,
and thus is very probable.
Runs of zero coefficients also occur quite frequently before the end-of-
block. In this case, better coding efficiency is obtained when code words
(symbols) are defined combining the length of the zero coefficient run with
the amplitude of the nonzero coefficient terminating the run.
Each nonzero AC coefficient is coded using this run-level symbol struc-
ture. Run refers to the number of zero coefficients before the next nonzero
coefficient; level refers to the amplitude of the nonzero coefficient. Fig-
ure 5.7(c) illustrates this. The trailing bit of each run-level code is the bit,
s, that codes the sign of the nonzero coefficient. If s is 0, the coefficient is
positive; otherwise it is negative.
The MPEG AC code table reproduced in Table5.5 has a large number
of run-level codes, one for each of the relatively probable run-level combina-
tions. The EOB symbol, with its two-bit code, is also shown. Combinations
of run lengths and levels not found in this table are coded by the escape
code, followed by a six-bit code for the run length and an eight- or 16-bit
code for the level.
Two codes are listed for run 0/level 1 in Table 5.5, and this may seem
a bit puzzling. First, why are there two codes for this? Second, why does
the binary code ‘1 s’ (where ‘s’ indicates the sign of the nonzero coefficient)
not conflict with the binary ‘10’ end-of-block (EOB) code? The reason for
96 MPEG VIDEO COMPRESSION STANDARD
CHAPTER 5. MPEG CODING PRINCIPLES 97
98 MPEG VIDEO COMPRESSION STANDARD
CHAPTER 5. MPEG CODING PRINCIPLES 99

this is the dual purpose of the code table — it is really two code tables
folded into one. In most situations the end-of-block can occur and the entry
labeled “next” is used. However, in nonintra coding a completely zero DCT
block is coded in a higher-level procedure and the end-of-block code cannot
occur before coding the first run-level. In this particular situation the entry
labeled "first" is used.
Codes labeled “first” are used only in nonintra coding. For intra cod-
ing, the DC is coded separately and an end-of-block can occur immediately
without any nonzero AC coefficients. Consequently, because the first entry,
‘1 s’, conflicts with the EOB code, it is not used.

5.4.5 Coding of P- and B-pictures


The decorrelation property of the DCT is really applicable only to intra-
coded pictures. Nonintra pictures are coded relative to a prediction from
another picture, and the process of predicting strongly decorrelates the data.
Fundamental studies by Rao and coworkers have shown that nonintra pic-
tures will not be optimally decorrelated by the DCT.4 However, since the
correlation is already quite small, any coding efficiency loss in ignoring co-
efficient correlation has to be small.
The real reason for using the DCT is quantization. Relatively coarse
quantization can be used for P- and B- pictures, and the bitrates are there-
fore very low. Even with a flat default quantization table that may not
fully exploit properties of the human visual system, DCT quantization is an
effective tool for reducing bit rate.
In P- and B-picture coding the DC coefficient is a differential value and
therefore is mathematically similar to the AC coefficients. Consequently,
AC and DC coding are integrated into one operation.
The coding of the DCT coefficients in P- and B-pictures is preceded by
a hierarchical coding sequence in which completely zero macroblocks (i.e.,
no nonzero DCT coefficients) are coded by a macroblock address increment
that efficiently codes runs of zero macroblocks.
Zero blocks in a nonzero macroblock are coded using the coded block
pattern (cbp), a six-bit variable in which each bit describes whether a cor-
responding DCT is active or completely zero. The variable-length codes for
the cbp efficiently code the zero blocks within the macroblock. Note that
the condition cbp=0 is handled by a macroblock address increment and the
macroblock is skipped.
4
The work in reference [ANR74] suggests that a sine transform might work better.
However, this conclusion is based on artificially created data with zero mean in which
DC level shifts do not occur. As we have seen in Chapter 3, the sine transform does not
handle DC levels gracefully.
100 MPEG VIDEO COMPRESSION STANDARD

Finally, any nonzero blocks must be coded, and for this the codes in
Table 5.5 are used. This table, with the first entry for run 0/level 1 ignored,
has already been applied to intra coding of AC coefficients. For P- and
B-pictures, the full table is used.
When coding the first run-level, the first entry for run 0/level 1 is used. In
this case an end-of-block cannot occur before at least one nonzero coefficient
is coded, because the zero block condition would have been coded by a cbp
code. After that, the end-of-block is possible, and the second entry for
run 0/level 1 must be used.
Although the code table is not a true Huffman code when this substitu-
tion is made, the approximation is pretty good. It is a very clever way of
eliminating the need for two separate code tables.

5.5 Encoder and decoder block diagrams


At this point the various concepts discussed in this chapter and preceding
chapters can be merged to create high-level encoders and decoder block
diagrams.

5.5.1 Reconstruction module


A reconstruction module, common to both encoders and decoders, is shown
in Figure 5.8. This module is used to reconstruct the pictures needed for
prediction. Note that the signal definitions in this figure apply to the encoder
and decoder block diagrams that follow.
The reconstruction module contains a dequantizer unit (Inv Q), a DC
pred unit (for reconstructing the DC coefficient in intra-coded macroblocks)
and an IDCT unit for calculating the inverse DCT. The IDCT output is
merged with the prediction (zero, in the case of intra-coded macroblocks)
to form the reconstruction.
The prediction is calculated from data in the picture store, suitably
compensated in the case of nonintra coding for the forward and backward
motion displacements.

5.5.2 Encoder block diagram


In addition to the reconstruction module of Figure 5.8, the encoder shown
in Figure 5.9 has several other key functional blocks: a controller, a forward
DCT, a quantizer unit, a VLC encoder, and a motion estimator.
The modules are fairly self-explanatory: The controller provides syn-
chronization and control, the quantized forward DCT is computed in the
FDCT and Q modules, forward and backward motion estimation is carried
out in the motion estimator block, and the coding of the motion vectors
CHAPTER 5. MPEG CODING PRINCIPLES 101
102 MPEG VIDEO COMPRESSION STANDARD

and DCT data is done in the VLC encoder. The motion estimation block
appears to be a simple element, but is perhaps the most complex part of
the system.
Note that the reconstruction module is general enough for both encoder
and decoder functions. In fact, a simpler version providing only I- and
P-picture reconstructions is sufficient for encoding.

5.5.3 Decoder block diagram


As in the encoder, the reconstruction module of Figure 5.8 is the central
block of the decoder. In this case motion displacements and DCT data are
decoded in the VLC decoder.
CHAPTER 5. MPEG CODING PRINCIPLES 103

You might also like