Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views17 pages

Sullivan 1998

This document discusses the principles and evolution of video compression techniques, focusing on the interaction between motion representation, waveform coding, and encoder optimization. It outlines various video coding standards such as H.120, MPEG-2, H.263, and JPEG, highlighting their features and bit rate efficiencies. The text emphasizes the importance of reducing data rates for video transmission while maintaining quality through methods like motion compensation and inter-frame coding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views17 pages

Sullivan 1998

This document discusses the principles and evolution of video compression techniques, focusing on the interaction between motion representation, waveform coding, and encoder optimization. It outlines various video coding standards such as H.120, MPEG-2, H.263, and JPEG, highlighting their features and bit rate efficiencies. The text emphasizes the importance of reducing data rates for video transmission while maintaining quality through methods like motion compensation and inter-frame coding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

he rate-distortion efficiency of today‘s video Video Compression Basics

compression schemes is based on a sophisti- Motion video data consists essentially of a time-ordered
cated interaction between various motion rep- sequence of pictures, and cameras typically generate ap-
resentation possibhties, waveform codmg of proximately 24,25, or 30 pictures (orjkwes) per second.
dfferences, and waveform coding of various refreshed re- This results in a large amount of data that demands the
gions. Hence, a ley problem in high-compression video use of compression. For example, assume that each pic-
coding is the operational control of the encoder. This ture has a relatively low “QCIF” (quarter-com-
problem is compounded by the widely varyltlg content mon-intermediate-format) resolution (i.e., 176 x 144
and motion found in typical video sequences, necessitating samples) for which each sample is digitally represented
the selection between dfferent representation possibilities with 8 bits, and assume that we slup two out of every
with vaniing rate-distortion effi-
I I
three Dictures in order to cut
ciency. This article addresses the down the bit rate. For color pic-
problem of video encoder optimi- tures. three color comDonentL
zation and discusses its conse- samples are necessary to repre-
quences o n the compression sent a sufficient color space for
architecture of the overall coding each Dixel. In order to transmit
system. Based on the well-laown even this relatively low-fidelity
hybrid video coding structure, sequence of pictures, the raw
Lagrangian optimization tech- source data rate is still more than
niques are presented that try to answer the question: 6 Mbit/s. However, today’s low-cost transmission chan-
‘“hat part of the video signal should be coded using what nels often operate at much lower data rates so that the
method and parameter settings?” data rate of the video signal needs to be further com-

74 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


1053-5888/98/$10.000 1998IEEE
A History of Existing
Visual Coding - Standards
H. 120: The first international digital video coding stan- MPEG-2: A step higher in bit rate, picture quality, and
dard [3]. It may have even been the first international digital popularity. MPEG-2 forms the heart of broadcast-quality
compression standard for natural continuous-tone visual digital television for both standard-definition and
content of any kind (whether video or still picture). H.120 high-definition television (SDTV and HDTV) [ 7-91.
was developed by the ITU-T organization (the International MPEG-2 video (IS 13818-2/ITU-TH.262) was designed to
Telecommunications Union-Telecommunications Stan- encompass MPEG-1 and to also provide high quality with
dardization Sector, then called the CCIIIT), and received fi- interlaced video sources at much higher bit rates. Although
nal approval in 1984. It originally was a conditional usually thought of as an IS0 standard, MPEG-2 video was
replenishment (CR) coder with differential pulse-code mod- developed as an official joint project of both the ISO/IEC
ulation (DPCM), scalar quantization, and variable-length JTCl and ITU-T organizations, and was completed in late
coding, and it had an ability to switch to quincunx 1994. Its primary new technical features were efficient han-
sub-sampling for bit-rate control. In 1988, a second version dling of interlaced-scan pictures and hierarchical bit-usage
of H. 120 added motion compensation and background pre- scalability. Its target bit-rate range was approximately 4-30
diction. (None of the later completed standards have yet in- Mbit/s.
cluded background prediction again, although a form of it is H.263: The first codec designed specifically to handle
in the draft of the future MPEG-4 standard.) Its operational very low-bit-rate video, and its performance in that arena
bit rates were 1544 and 2048 Kbit/s. H.120 is essentially no is still state-of-the-art [lo, 111.H.263 is the current best
longer in use today, although a few H.120 systems are ru- standard for practical video telecommunication.Its orig-
mored to still be in operational condition. inal target bit-rate range was about 10-30 Kbit/s, but this
H.261: The first widespread practical success-a video was broadened during development to perhaps at least
codec capable of operation at affordable telecom bit rates 10-2048Kbit/s as it became apparent that it could be su-
(with 80-320 Kbit/s devoted to video) [4,5]. It was the first perior to H.261 at any bit rate. H.263 (version 1) was a
standard to use the basic typical structure we find still pre- project of the ITU-T and was approved in early 1996
dominant today (16x 16macroblock motion compensation, (with technical content completed in 1995).The key new
8 x 8 block DCT, scalar quantization, and two-dimensional technical features of H.263 were variable block-size mo-
run-level variable-length entropy coding). H.261 was ap- tion compensation, overlapped- block motion compensa-
proved by the ITU-T in early 1991 (with technical content tion (OBMC), picture-extrapolating motion vectors,
completed in late 1990). It was later revised in 1993 to in- three-dimensional run-level-last variable-length coding,
clude a backward-compatible high-resolution graphics median MV prediction, and more efficient header infor-
transfer mode. Its target bit-rate range was 64-2048 Kbit/s. mation signaling (and, relative to H.261, arithmetic cod-
JPEG: A highly successfulcontinuous-tone, still-picture ing, half-pixel motion, and bi-directional
coding standard named after the Joint Photographic Experts prediction-but the first of these three features was also
Group that developed it [ 1,2]. Anyone who has browsed the found in JPEG and some form of the other two were in
world-wide web has experienced JPEG. JPEG (IS MPEG-1). At very low bit rates (e.g., below 30 Kbit/s),
10918-l/ITU-T T.81)was originally approved in 1992 and H.263 can code with the same quality as H.261 using
was developed as an official joint project of both the half or less than half the bit rate [ 121. At greater bit rates
ISO/IEC JTCl and ITU-T organizations. In its typical use, (e.g., above 80 Kbit/s) it can provide a more moderate
it is essentially H.261 INTRA coding with prediction of aver- degree of performance superiority over H.261. (See also
age values and an ability to customize the quantizer recon- H.263 + below.)
struction scaling and the entropy coding to the specific H.263+: Technically a second version of H.263 [ 10,
picture content. However, there is much more in the JPEG 131. The H.263+ project added a number of new op-
standard than what is typically described or used. In particu- tional features to H.263. One notable technical advance
lar, this includes progressive coding, lossless coding, and over prior standards is that H.263 version 2 was the first
arithmetic coding. video coding standard to offer a high degree of error re-
MPEG-1: A widely successfulvideo codec capable of ap- silience for wireless or packet-based transport networks.
proximately VHS videotape quality or better at about 1.5 H.263+ also added a number of improvements in com-
Mbit/s and covering a bit rate range of about 1-2 Mbit/s [6, pression efficiency, custom and flexible video formats,
71. MPEG-1 gets its acronym from the Moving Pictures Ex- scalability, and backward-compatible supplemental en-
perts Group that developed it [6, 71. MPEG-1 video (IS hancement information. It was approved i n January of
11172-2) was a project of the ISO/IEC JTCl organization 1998 by the ITU-T (with technical content completed in
and was approved in 1993. In terms of technical features, it September 1997). It extends the effective bit-rate range
added bi-directionally predicted frames (known as of H.263 to essentially any bit rate and any progres-
B-frames) and half-pixel motion. (Half-pixel motion had sive-scan (noninterlace) picture formats and frame rates,
been proposed during the development of H.261, but was and H.263+ is capable of superior performance relative
apparently thought to be too complex at the time.) It pro- to any existing standard over this entire range. The first
vided superior quality than H.261 when operated at higher author was the editor of H.263 during the H.263 + pro-
bit rates. (At bit rates below, perhaps, 1 Mbit/s, H.261 per- ject and is the Rapporteur (chairman) of the ITU-T Ad-
forms better, as MPEG-1 was not designed to be capable of vanced Video Coding Experts Group (SG16/Q15),
operation in this range.) which developed it.

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 75


redundancy reduction method used in the first digital
The most successful cli ISS of video coding standard, ITU-T Rec. H.120 [3]. CR cod-
ing consists of sending signals to indicate which areas ofa
video compression des igns are picture can just be repeated, and sending new coded in-
called hybrid codecs. formation to replace the changed areas. CR thus allows a
choice between one of two modes of representation for
each area, which are called the SKIP mode and the INTRA
mode. However, C R coding has a significant shortcom-
pressed. For instance, using V.34 modems that transmit ing, which is its inability to refine an approximation. Of-
at most 33.4 I<bit/s over dial-up analog phone lines, we ten the content of an area of a prior picture can be a good
still need to compress the video bit rate further by a factor approximation of the new picture, needing only a minor
of about 200 (more if audio is consuming 6 Kbit/s of that alteration to become a better representation. Hut CR cod-
same channel or if the phone line is too noisy for achiev- ing allows only exact repetition or complete replacement
ing the full bit rate of V.34). of each picture area. Adding a third tvpe of “prediction
One way of compressing video content is simply to mode,” in which a refining pame dzference approxima-
compress each picture, using an image-coding syntax tion can be sent, results in a further improvement of com-
such as JPEG [ 1,2]. The most common “baseline” JPEG pression performance.

.
scheme consists of breaking up the image into equal-size The concept of frame difference refinement can also be
blocks. These blocks are transformed by a discrete cosine taken a step further, by adding motion-compensatedpredic-
transform (DCT), and the DCT coefficients are then tion (MCP). Most changes in video content are typically
quantized and transmitted using variable-length codes. due to the motion of objects in the depicted scene relative
We will refer to this kind of coding scheme as to the imaging plane, and a small amount of motion can
INTRA-frame coding, since the picture is coded without result in a large difference in the values of the pixels in a
referring to other pictures in the video sequence. In fact, picture area (especially near the edges of an object). Of-
such INTRA coding alone (often called “motion J P E G ) is ten, displacing an area of the prior picture by a few pixels
in common use as a video coding method today in pro- in spatial location can result in a significant reduction in
duction-quality editing systems that demand rapid access the amount of information that needs to be sent as a frame
to any frame of video content. difference approximation. This use of spatial displace-
However, improved compression performance can be ment to form an approximation is known as motion com-
attained by talung advantage
of the large amount of tempo-
ral redundancy in video con-
tent. We will refer to such
Input ame
__
DCT,
Quantization,
-
Encoded Residual
(To Channel)
.
Entropy Code
techniques as INTER-frame
coding. Usually, much of the
I
depicted scene is essentially (Dotteq Box
I
I

just repeated in picture after Entropy Decode, I


ShowslDecoder)
I Inverse Quantize, I
picture without any signifi- I
I
Motion Inverse DCT
cant change. It should be ob- I
I

vious then that the video can I Compensated I


I
I
I
I Prediction
be represented more effi- I 0
I
I
ciently by coding only the I
I
Approximated I
I
changes in the video content, I Input Frame I
I (b- 1 -
rather than coding each entire I
I
Prior Coded (To Display) I

picture repeatedly. This abil- I Approximated I


v
i t y to use t h e t e m p o - Motion Frame I

ral-domain redundancy to :
Compensated ., - Frame Buffer I
I
1 Prediction (Delay) I
improve coding efficiency is I
A
I
I
what fundamentally distin- ‘ _ _ _ - - _ _ _ - - - _ _ _ _ _ - - _ _ _ _ - _ _ _ _ _ ______-____I

guishes video compression Motion Vector and


0
from still-imagecompression. Prediction Mode Data
(To Channel)
A simple method of im-
proving compression by cod-
ing only the changes in a
video scene is called condi-
Motion
Estimation and
9
-
tional replenishment (CR),
and it was the only temporal A 1. Typical motion-compensated DCT video coder.

76 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


Motion-Compensated
In practice, a number of Video Coding Analysis
interactions between coding Consider the nth coded picture of size W x H in a video
sequence, consisting of an array I ,, (s) of color component
decisions must be neglected in values (e.g., T,,(s),Cbn(s), a n d G , , (s)) for each pixel lo-
video coding optimization. cation s = (x,y), in which x and y are integers such that
0 5 x < W and 0 I y < H . The decoded approximation of
this picture will be denoted as T,,(s).
The typical video decoder (see Fig. 1) receives a repre-
pensation and the encoder’s search for the best spatial sentation of the picture that is segmented into some num-
displacement approximation t o use is known as motion ber K of distinct regional areas {al3} $‘I,
jl . For each area, a
estimation. The coding of the resulting difference signal prediction-mode signal p,, E {OJ} is received indicating
Jj

for the refinement of the MCP signal is linown as dis- whether or not the area is predicted from the prior pic-
placed frame difference (DFD) coding. ture. For the areas that are predicted from the prior pic-
Hence, the most successful class of video compression ture, a motionvector (MV),denoted v , . ! ~is, received. The
designs are called hybrid codecs. The naming ofthis coder MV specifies a spatial displacement for motion compen-
is due to its construction as a hybrid of motion-handling sation of that region. Using the prediction mode and
and picture-coding techniques, and the term codec is used
t o refer to both the coder and decoder of a video compres-
sion system. Figure 1 shows such a kybyid coder. Its de- An Overview of Future Visual
sign and operation involve the optimization of a number Coding Standardization Projects
of decisions, including MPEG-4: A future visual coding standard for both still
and moving visual content. The ISO/IEC SC29 WG11 or-
1. H o w t o segment each picture into areas,
ganization is currently developing two drafts, called ver-
2. Whether or not t o replace each area of the picture sion 1 and version 2 of MPEG-4 visual. Final approval of
with completely new INTRA-picture content, version 1 is planned in January 1999 (with technical con-
3. If not replacing an area with new INTRA content tent completed in October 1998), and approval of version
(a) H o w to do motion estimation; i.e, how t o select 2 is currently planned for approximately one year later.
the spatial shifting displacement to use for INTEK-picture MPEG-4 visual (which will become IS 14496-2) will in-
clude most technical features of the prior video and
predictive coding (with a zero-valued displacement being
still-picture coding standards, and will also include a num-
an important special case), ber of new features such as zero-tree wavelet coding ofstill
(b) H o w to do DFD coding; i.e., how to select the ap- pictures, segmented shape coding ofobjects, and coding of
proximation to use as a refinement of the INTER predic- hybrids ofsynthetic and natural video content. It will cover
tion (with a zero-valued approximation being an essentiallyall bit rates, picture formats, and frame rates, in-
important special case), and cluding both interlaced and progressive-scan video pic-
4.If replacing an area with new I N T I U content, what tures. Its efficiency for predictive coding of normal
camera-view video content will be similar to that of H.263
approximation to send as the replacement content. for noninterlaced video sources and similar to that of
At this point, we have introduced a problem for the en- MPEG-2 for interlaced sources. For some special purpose
gineer who designs such a video coding system, which is: and artificially generated scenes, it will provide signifi-
Whatpaaof the imapeshould be coded using what method?If cantly superior compression performance and new ob-
the possible modes of operation are restricted to INTRA ject-oriented capabilities. It will also contain a still-picture
coding and SKIP, the choice is relatively simple. However, coder that has improved compression quality relative to
hybrid video codecs achieve their compression perfor- JPEG at low bit rates.
H.263+ +: Future enhancements of H.263. The
mance by employing several inodes of operation that are
H.263+ + project is considering adding more optional cn-
adaptively assigned to parts of the encoded picture, and hancements to H.263 and is currently scheduled for com-
there is a dependency between the effects of the motion pletion late in the year 2000. It is a project of the ITU-T
estimation and DFD coding stages of INTER coding. The Advanced Video Coding Experts Group (SG16/Q15).
modes of operation are generally associated with sig- JPEG-2000: A hture new still-picture coding stan-
n a 1- d e pe tide n t rate - d i s t o r t i on character is t i cs , and dard. JPEG-2000 is a joint project of the ITU-T SG8 and
rate-distortion trade-offs are inherent in the design of ISO/IEC JTC1 SC29 WGl organizations. It is scheduled
each ofthese aspects. The second and third items above in for completion late in the year 2000.
H.26L: A future new generation of video coding stan-
particular are unique to motion video coding. The opti- dard with improved efficiency, error resilience, and stream-
mization of these decisions in the design and operation of ing support. H.26L is currently scheduled for approval in
a video coder is the primary topic of this article. Some fur- 2002. It is a project ofthe ITU-T Advanced Video Coding
ther techniques that go somewhat beyond this model will Experts Group (SG16/Q15).
also be discussed.

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 77


Standard Hybrid Video Codec Terminology
he following terms are useful for understanding the
T various internationalstandards for video coding:
prediction mode: A basic representation model that is
selected for use in approximating a picture rcgion (INTKA,
INTEK, etc.).
(Note: TheMVv,.,, hasnoeffect ifpi.. = Oandso theMV mode decision: An encoding process that selects the
is therefore normally not sent in that case.) prediction mode for each region to be encoded.
In addition to the prediction mode and M V informa- block: A rectangular region (normally of size 8 x 8) in a
tion, the decoder receives an approximation E,, (s)ofthe picture. The discrete cosine transform (DCT) in standard
D F D residual errorFl,I, (s)between the true image value video coders operates on 8 x 8 block regions.
I , (s) and its MCP I , , (s). It then adds the residual signal macroblock:A region of size 16 x 16 in the luminance
to the prediction t o form the final coded representation picture and the corresponding region of chrominance in-
formation (often an 8 x 8 region), which is associated with
a prediction mode.
motion vector (MV): A spatial displacement offset for
use in the prediction of an image region. In the INTEK pre-
Since there is often no movement in large parts ofthe pic- diction mode an MV affects a macroblock region, while in
ture, and since the representation of such regions in the the INTER+4V prediction mode, an individual MV is sent
previous picture may be adequate, video coders often for each ofthe four8 x 8 luminance blocks in a macroblock.
provide special provisions for a SKIP mode of area treat- motion compensation:A decoding process that repre-
ment, which is efficiently transmitted - using very short sents motion in each region of a picture by application of
code words (p,,, = 1 , ~ , = , ,O~, R i , n ( s ) = O ) . the transmitted MVs to the prior decoded picture.
I n video coders designed primarily for natural cam- motion estimation: An encoding process that selects
era-view scene content, often little real freedom is given the MVs to be used for motion compensation.
to the encoder for choosing the segmentation of the pic- half-pixel motion: A representation of motion in
ture into region areas. Instead, the segmentation is typi- which an MV may specify prediction from pixel locations
that are halfway between the pixel grid locations in the
cally either fixed t o always consist of a particular
prior picture, thus requiring interpolation to construct the
two-dimensional block size (typically 16 x 16 pixels for prediction of an image region.
prediction-mode signals and 8 x 8 for D F D residual con- picture-extrapolating MVs: A representation of mo-
tent) or in some cases it is allowed to switch adaptively be- tion in which an MV may specifyprediction from pixel lo-
tween block sizes (such as allowing the segmentation cations that lie partly or entirely outside the boundaries of
used for motion compensation to have either a 16 x 16 or the prior picture, thus requiring extrapolation of the edges
8 x 8 block size). This is because providing the encoder of the picture to construct the prediction of an image re-
more freedom t o specify a precise segmentation has gen- gion.
erally not yet resulted in a significant improvement of overlapped-block motion compensation (OBMC):
compression performance for natural camera-view scene A representation of motion in which the MVs that repre-
content (due to the number of bits needed to specify the sent the motion in a picture have overlapping areas of influ-
ence.
segmentation), and also because determining the best
INTRA mode: A prediction mode in which the picture
possible segmentation in an encoder can be very complex. content of a macroblock region is represented without ref-
However, in special applications (especially those includ- erence to a region in any previously decoded picture.
ing artificially constructed picture content rather than SKIP mode: A prediction mode in which the picture
cmera-view scenes), segmented object-based coding can content of a macroblock region is represented as a copy of
be j u s t i f i e d . R a t e - d i s t o r t i o n o p t i m i z a t i o n of the macroblock in the same location in a previously de-
segmentations for variable block-size video coding was coded picture.
first discussed in [30,31], which was later enhanced to in- INTER mode: A prediction mode in which the picture
clude dynamic programming t o account for sequential content of a macroblock region is represented as the sum of
dependencies in [ 371- [ 391. The optimization of coders a motion-compensated prediction using a motion vector,
that use object segmentation is discussed in an accompa- plus (optionally)a decoded residual difference signal repre-
sentation.
nying article [ 151.
INTER+4V mode: A prediction mode in which the pic-
ture content of a macroblock region is represented as in the
Distortion Measures INTEK mode, but using four motion vectors (one for each
Rate-distortion optimization requires an ability t o mea- 8 x 8 block in the macroblock).
INTER+Q mode: A prediction mode in which the pic-
sure distortion. However, the perceived distortion in vi-
ture content of a macroblock is represented as in the INTEK
sual content is a very difficult quantity to measure, as the mode, and a change is indicated for the inverse
characteristics of the human visual system are complex quantization scaling of the decoded residual signal repre-
and not well understood. This problem is aggravated in sentation.
video coding, because the addition of the temporal do-
main relative to still-picture coding further complicates

78 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


the issue. In practice, highly imperfect distortion models better performance can be obtained by including all three
such as the sum of squared differences (SSD) or its equiv- color components. (The chrominance components are
alents, known as mean squared error (MSE) or peak sig- often treated as something of a minor nuisance in video
nal-to-noise ratio (PSNR), are used in most actual coding; since they need only about 10% ofthe bit rate of
comparisons. They are defined by the luminance they provide a limited opportunity for op-
timization gain.)
7,G) = C I F ( s )- G ( s ) 1' (3)
.,E,?

1 Effectiveness of Basic Technical Features


F,G) = -SSD,, ( F , G ) I n the previous sections we described the various
I4 (4) technical features o f a basic modern video coder. The ef-
fectiveness of these features and the dependence of this
effectiveness on video content is shown in Fig. 2. The
(F,G)= lolog,,, (255)2 decibels . u p p e r plot of Fig. 2 s h o w s performance for a
MSE,, (F,G)
(5) videophone sequence known as Mother 0Daughter,
Another distortion measure in common use (since it is of- with moderate object motion and a stationary back-
ten easier to compute) is the sum of absolute differences ground. The lower plot o f Fig. 2 shows performance for
a more demanding scene known as Foremun, with heavy
(SAD) object motion and an unstable hand-held moving cam-
SAD, ( F , G )= C I F ' ( s )- G ( s ) I (6) era. Each sequence was encoded in QCIF resolution at
,€A 10 frames per second using the framework of a
where F and G are two array arguments (such as lumi- well-optimized H.263 [ 101 video encoder (using opti-
nance arrays of the actual and approximated pictures). mization methods described later in this article). (H.263
These measures are often applied to only the luminance has 16 x 16 prediction-mode regions called macroblocks
field of the picture during optimization processes, but and 8 x 8 DCT-based DFD coding.)

Complicating Factors in
Video Coding Optimization
he video coder model described in this article is useful for age area may also be filtered to avoid high-frequency artifacts
T illustration purposes, but in practice actual video coder
designs often differ from it in various ways that complicate
(as in Rec. H.261 141).
Often there are interactions between the coding of differ-
design and analysis. Some of the important differences are ent regions in a video coder. The number of bits needed to
described in the following few paragraphs. specifyan MV value may depend on the values of the MVs in
Color chrominance components (e.g., Cb,(s) and Cr,(s)) neighboring regions. The areas of influence of different MVs
are often represented with lower resolution (e.g., can be overlapping due to overlapped-block motion com-
W / 2 x H / 2 ) than the luminance component of the image pensation (OBMC) [16]-[19], and the areas of influence of
Y(s). This is because the human psycho-visual system is coded transform blocks can also overlap due to the applica-
much more sensitive to brightness than to chrominance, al- tion of deblocking filters. While these cross-dependencies
lowing bit-rate savings by coding the chrominance at lower can improve coding performance, they can also complicate
resolution. In such a system, the method of operation must the task of optimizing the decisions made in an encoder. For
be adjusted to account for the difference in resolution (for this reason these cross-dependencies are often neglected (or
example, by dividing the MV values by two for chrominance only partially accounted for) during encoder optimization.
components). One important and often-neglected interaction be-
Since image values I,(s)are defined only for integer pixel tween the coding of video regions is the temporal propa-
locations s = ( x , y) within the rectangular picture area, the gation of error. The fidelity of each area of a particular
above model will work properly in the strict sense only if ev- picture will affect the ability to use that picture area for
ery motion vector v,,,,is restricted to have an integer value the prediction of subsequent pictures. Real-time
and only a value that causes access to locations in the prior encoders must neglect to account for this aspect to a large
picture that are within the picture's rectangular boundary. extent, since they cannot tolerate the delay necessary for
These restrictions, which are maintained in some early optimizing a long temporal sequence of decisions with
video-coding methods such as ITU-T Rec. H.261 141, are accounting for the temporal effects on many pictures.
detrimental to performance. More recent designs such as However, even nonreal-time encoders also often neglect
ITU-T Rec. H.263 [ 101 support the removalofthese restric- this to account for this propagation in any significant
tions by using interpolation of the prior picture for any frac- way, due to the sheer complexity of adding this extra di-
tional-valuedMVs (normally half-integer values, resulting in mension to the analysis. An example for the exploitation
what is called half-pixel motion) and MVs that access loca- of temporal dependencies in video coding can be found
tions outside the boundary of the picture (resulting in what in [20]. The work of Ramchandran, Ortega, and Vetterli
we call picture-extrapolating MVs). The prediction of an im- in [ 2 0 ] was extended by Lee and Dickinson in 1211.

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 79


A gain in performance is shown for forming a C R breaks down. For relatively dormant regions of the video,
coder by adding tlie SKIP coding mode to the encoder. simply copying a portion of the previously decoded frame
Further gains in performance are shown when adding the into the current frame may be preferred (SKIP mode). In-
various INTER coding modes to the encoder that were tuitively, by allowing multiple modes of operation, we ex-
discussed in the previous sections: pect improved rate-distortion performance if the modes
A INTER (MV = (0,O) only): frame-difference coding can significantly customize the coding for different types
with only zero-valued MV displacements of scene statistics, and especially if the modes can be ap-
A INTER (Full-pixel motion compensation): inte- plied judiciously t o different spatial and temporal regions
ger-pixel (full-pixel) precision motion compensation of an imagc sequence.
with DFD coding The modes of operation that are assigned t o the im-
A INTER (Halfipixel motion compensation) : half-pixel age regions have differing rate-distortion characteris-
precision motion compensation with D F D coding tics, and the goal of an encoder is t o optimize its overall
A INTER & INTER+4V: half-pixel precision motion com- fidelity: Minimize distortion U , subject to a constraint R,
pensation with DFl3 coding and the addition of an "ad- on the number of bits used R. This constrained problem
vanced prediction" mode (H.263 Annex F), which reads as follows
includes a segmentation switch allowing a choice of either
min{ZI>, subject t o R < K L. (7)
one or four MVs per 16 x 16 area and also includes over-
lapped-block motion compensation (OBMC) and pic- The optimization task in Eq. (7) can be clegantly solved
ture-extrapolating MVs [ lo]. (The use offour MVs per using Lagrangian optimization where a distortion term is
macroblock is called the INTEK+4V prediction mode.)
Except in the final case, tlie same H.263 baseline syntax
Mother-Daughter, QCIF, SKIP=2, Q=4,5,7,10,15,25
was used throughout, with changes only in the coding I
method (the lower four curves are thus slightly penalized in
performance by providing syntactical support for features 4c
that are never used in the encoding). In the final case, H.263 3e
syntAx was used with its D and F annexes active [lo]. 3E
However, the coding results for the two sequences dif- g 34
fer. I n the low-motion sequence, the gain achieved by us- I

ing CR (a choice of SKIP or INTRA) instead of just 5 32


-
(0
INTIU-picture coding is the most substantial, and as a 30
more features are added, the benefits diminish. O n the 2a ............. .........
-
- INTRA Mode Only
high-motion sequence, C R is not very useful because the
-
INTRA+CR
26 ............ ._......, INTRA + CR + INTER (MV=(O,O) Only
whole picture is changing from frame t o frame, and the ++ INTRA + CR + INTER (Full-pelMC)
INTRA + CR + INTER (Half-pel MC)
addition of motion compensation using the various 24 * INTRA + CR + INTER + INTER-4V
INTER modes provides the most significant gain, with 22
further gain added by each increasing degree of sophisti- 0 50 100 150 200 250 300 350
cation in motion handling. Bit Rate [kbps]

Foreman, QCIF, SKIP=2, Q=4,5,7,10,15,25


0ptimization Techniques
I I
1
In the previous section, it was demonstrated that by add-
ing efficient coding options in the rate-distortion sense to
a video codec, the overall performance increases. The op-
timization task is to choose, for each image region, the
most efficient coded representation (segmentation, pre-
diction modes, MVs, quantization levels, etc.) in the
rate-distortion sense. This task is complicated by the fact
that the various coding options show varying efficiency at
different bit rates (or levels offidelity) and with difkrent
scene content.
For example, in H.263 [lo], block-based motion
compensation followed by quantization of the prediction
error (INTER mode) is an efficient means for coding 22 I I
0 100 200 300 400 500 600
much of the key changing content in image sequences.
Bit Rate [kbps]
On the other hand, coding a particular macroblock di-
rectly (INTRA mode) may be more productive in situa- A 2. Coding performance for the sequences Mother & Daughter
tions when the block-based translational motion model (top) and Foreman (bottom).

80 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


weighted against a rate term [22-321. The Lagrangian
formulation of the minimization problem is given by

min{ J } , where J = D + hR, (8)


where the Lagrangian rate-distortion functional J is min-
imized for a particular value of the Lagrange multiplier h.
Each solution to Eq. (8) for a given value of the Lagrange
multiplier h corresponds to an optimal solution to Eq.
(7)for a particular value of R, [22,23]. More details on meeting those constraints. In a later section we will show
Lagrangian optimization are discussed in the accompany- how control over h can be tightly linked to the more con-
ing article by Ortega and Ramchandran [ 141. ventional practice of control over the inverse quantization
This technique has gained importance due to its effec- step size,
tiveness, conceptual simplicity, and its ability to effec- A feedback control of the buffer state of video codecs
tively evaluate a large number of possible coding choices was proposed by Choi and Park in [341, where the con-
in an optimized fashion. If Lagrangian bit allocation and trol is applied to the Lagrange multiplier h. Trellis-based
entropy codes for signaling the coding modes are used, buffer c o n t r o l has been presented by O r t e g a ,
the number of choices available for use need not be re- Ramchandran, and Vetterli in [35], where fast approxi-
stricted to just a few. As a result, the computation time to mations are achieved using the Lagrangian formulation.
test all the modes may become the limiting factor on per- A low-delay rate control method for H.263 was provided
formance, rather than the capabilitiesof the syntax itself. in [361. There are many approaches to rate control; How-
In practice, a number of interactions between coding ever, the use of the Lagrange multiplier method of opti-
decisions must be neglected in video coding optimiza- mization within these rate-control schemes can often help
tion. The primary problems are the use of motion estima- to avoid losses in coding performance that might other-
tion and prediction mode decisions, and the common wise result from their use.
presence of cascading effects of decisions made for one re-
gion on the coding of subsequent regions in space and
time. In addtion, the overall bit rate must typically be Motion Estimation
controlled to match the channel capacity-which further Ideally, decisions should be controlled by their ultimate
complicates matters. All three quantities, D , h , and R, effect on the resulting pictures; however, this ideal may
tend to be subject to approximations and compromises in not be attainable in an encoder implementation. For ex-
designing video coding systems. ample, in considering each possible MV to send for a pic-
ture area, an encoder should perform an optimized
codng of the residual error and measure the resulting bit
Bit-Rate Control usage and distortion. Only by doing this can it really
The overall bit rate of a video coder is determined by its choose the best possible MV value to send (even if ne-
prediction-mode decisions, MV choices, and DFD cod- glecting the effect of that choice on later choices spatially
ing fidelity. The last of these three is typically the most im- and later pictures temporally). However, there are typi-
portant for bit-rate control, and the residual fidelity is cally thousands of possible MV values to choose from,
typically controlled by choosing a step-size scaling to be and coding just one residual difference signal typically re-
used for inverse quantization reconstruction of the trans- quires a significant fraction of the total computational
formed difference signal [331. A larger step size results in power of a practical encoder.
a lower bit rate and a larger amount of distortion. Thus, A simpler method of performing motion estimation is
the choice of step size is closely related to the choice of the to simply search for an MV that minimizes the prediction
relative emphasis to be placed on rate and distortion; i.e., error prior to residual coding, perhaps giving some special
the choice of h. (The choice of the quantizer step-size preference to the zero-valued MV and to the MV value
scaling must be communicated to the decoder, but h is an that requires the fewest bits to represent as a result of MV
encoder-only issue and is not needed by the decoder.) As prediction in the decoder. These biases prevent spurious
a last resort, the coding of entire pictures can be skipped large MV values (whch require a large number of bits to
by the encoder as a bit-rate control mechanism (resulting represent but may provide only little prediction benefit).
in a less fluid rendition of motion). Further simplification is needed in real-time imple-
In some cases the bit rate must be controlled to main- mentations. A straightforward minimum-squared-error
tain a constant local-average bit rate over time, but in “U-search)) motion estimation that tests all possible in-
other cases it may be allowed to vary much more widely teger values of an MV within a +L range (video coding
(such as by allowing the amount of scene content activity syntax typically supports L = 16 or L = 32, and one op-
to govern the bit rate). Whatever the constraints imposed tional mode of H.263 supports an unlimited range)
on the bit rate of the system, control over h in a would require approximately 3(2L + 1) operations per
well-optimized encoder can provide an excellent means of pixel (two adds and one multiply per tested MV value).

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 81


A d d n g half-pixelMVs to the search multiplies the num- small values of h MOnON correspond to high fidelities and
ber of MV values to test by a factor of four, and adds the bit rates and large values of h MOnON correspond to lower
requirement of an interpolation operation for generating fidelities and bit rates. Sullivan and Baler proposed such a
the half-pixelsampling-grid locations in the prior picture. rate-distortion-optimized motion estimation scheme for
Such levels of complexity are beyond the capabilities of fixed or variable block sizes in [3 11, and more work on the
many of today’s video coder implementations-and if subject has appeared in [32] and [37]-[42].
this much computational power was available to an im-
plementation, devoting it all to this type of search might VariableBlock Sizes
not be the best way to gain performance. Motion estima-
The impact of the block size on MCP fidelity and bit rate
tion complexity is often reduced in implementations by are illustrated in Fig. 3 for the video sequencesMothey 6
the use of iterative refinement techniques. While we do
DuuJhter (top) and Foyeman (bottom). For the data in
not specifically address reduced-complexity motion esti- this figure, the motion estimation and compensation
mation herein, rate-distortion optimization w i h the were performed using the sequence of original video
context of a reduced-complexity search can also often frames, with temporal subsampling by a factor of 3. The
provide a performance benefit. motion estimation was performed by minimizing
We can view MCI? formation as a source coding prob- JMOTION in Eq. ( 9 ) .In the first part of the motion estima-
lem with a fidelity criterion, closely related to vector tion procedure, an integer-pixel-accurate lsplacement
quantization. For the number of bits required to transmit v e c t o r was f o u n d w i t h i n a search r a n g e of
the MVs, MCP provides a version of the video signal with [-15.151 x [-15.151 pixels relative to the location of the
a certain fidelity. The rate-distortion trade-off can be con- block t o be s e a r c h e d . T h e n , given t h i s i n t e -
trolled by various means. One approach is to treat MCP ger-pixel-accurate displacement vector, its surroun&ng
as entropy-construined vector guuntization (ECVQ) [24, half-pixel positions were checked for improvements
311. Here, each image block to be encoded is quantized when evaluating Eq. (9).This second stage of this process
using its own codeboolc that consists of a neighborhood is commonly called half-pixel refinement.
of image blocks of the same size in the previously decoded For the curves in Fig. 3, we tested the impact of mo-
frames (as determined by the motion estimation search tion compensation block size on coding performance.
range). A codeboolc entry is addressed by the translational For this test we evaluated three different sets of choices
MVs, which are entropy coded. The criterion for the for 16 x 16 macroblocli prediction modes:
block motion estimation is the minimization of a A Case 1: INTER-codingusing only luminance regions of
Lagrangian cost function wherein the distortion, repre- size 16 x 16 samples (choosing between the SIUP mode
sented as the prediction error in SSD or SAD, is weighted signaled with codeword “1”and the INTER mode sig-
against the number of bits associated with t h e naled with codeword “0” followed by a MV for the
translational MVs using a Lagrange multiplier. 16 x 16 region)
An alternative interpretation is to view the motion A Case 2: INTER-codmg using o d y luminance blocks of
search as an estimation problem: the estimation of a mo- size 8 x 8 samples (choosing between the SIm mode sig-
tion displacement field for the image. The problem of naled with codeword “1,”the INTER+4V mode signaled
motion estimation becomes increasingly ill-conditioned with codeword “0)) followed by four MVs for 8 x 8regions)
as we increase the motion estimation search range and re- A Case 3 : C o m b i n i n g cases 1 a n d 2 using a
duce the block size. The ill-conditioning results in a lack rate-constrained encodmg strategy, which adapts the fre-
of consistency in the estimated MVs, resulting in a loss of quency of using the various region sizes using Lagrange
accuracy in estimating true motion. The Lagrangian for- multiplier optimization (choosing between “1”for SIUP,
mulation can regularize the lsplacement field estimate. “01” for INTER with a MV for the 16 x 16 region, and
Hence, the Lagrangian formulation yields a solution to “00”for WTER+4V with four MVs for 8 x 8regions) [3 11.
the problem not only when viewing motion estimation as Each MV was represented using the H.263 method of
a source coding technique, but also when viewing it an MV prediction and variable-length coding.
ill-condtioned lsplacement field estimation problem. Case 1can achieve better prediction than case 2 at the
Block motion estimation can therefore be viewed as lowest mode-decision bit rates, since it can represent a
the minimization of the Lagrangian cost h c t i o n moving area with one fourth as many MVs. However,
case 1 cannot achieve the predction quality of case 2
when the bit rate is higher, because case 2 can represent
finer motion detail. Summarizing the comparison of
in which the distortionD,,, ,representing the prediction cases 1and 2, the use of 16 x 16 blocks is more beneficial
error measured as SSD or SAD, is weighted against the at low rates, while 8 x 8 blocks are desirable at high rates.
number of bits R, associated with the MVs using a Case 3 can adaptively choose the proper block size as
Lagrange multiplier h MOnON . The Lagrange multiplier needed, so it obtains the best prediction at virtually all bit
imposes the rate constraint as in ECVQ, and its value &- rates. (Case 1has better prediction than case 3 at the very
rectly controls the rate-distortion trade-off, meaning that lowest mode-decision bit rates, since it does not require

82 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


an extra bit per non-SKIP macroblock to distinguish be- and an estimation-theoretic analysis with a focus on
tween the INTER and INTER+4V cases.) OBMC was presented in [19]. A rate-distortion effi-
The ultimate impact of the block size on the final ob- ciency analysis including OBMC and B-frames was pre-
jective coding fidelity is shown in Fig. 4. In this expcri- sented in [46], and an algorithm to take advantage of
ment, the residual coding stage and the INTRA coding MCP in an entropy-constrained framework was pro-
mode were added to the scheme, producing a complete posed in [47,48].
encoder. A complete H.263 video coder (using annexes The second item, long-term memory MCP as pro-
D and F) was used for this experiment. The tendencies posed in [49], refers to extending the spatial displacement
observed for the case of motion compensation only (see vector utilized in block-based hybrid video coding by a
Figure 3) are also true here. Allowing a large range of variable time delay, permitting the use of more frames
coding fidelities for MCP provides superior performance than the last prior decoded one for MCP. The long-term
over the entire range of bit rates. However, every extra bit memory covers several seconds of decoded frames at the
spent for motion compensation must be justified against encoder and decoder. Experiments when employing SO
other coding decisions made in mode selection and resid- frames for motion compensation using the sequences
ual coding [ 321. Foreman and Mother &Daughter show that long-term
memory MCP yields about 2 dR and 1 dB PSNR im-
provements in prediction error against the one-frame
Other Methods for Improving
case, respectively. However, in the long-term-memory
Motion-Compensated Prediction
Besides block size variation to improve the MCP, various
other methods have been proposed. Examples of these Mother-Daughter, QCIF, SKIP=2, 0RIG.REF.
schemes include
1. Multi-hypothesis MCP
2. Long-term memory MCP
3. Complex motion models
The scheme used for the first item, multi-hypothesis
MCP, is that various signals are superimposed to com-
pute the M C P signal. The multi-hypothesis mo-
tion-compensated predictor for a pixel location s E ai>?,
is
defined as

-
- 8x8 blocks
16x16 and 8x8 blocks
32 1
y i t h it,(,) b e i n g a p r e d i c t e d pixel value a n d
I,,_,, (s - v , , , ~ . ,being
,) a motion-compensated pixel from
a decoded frame An time instants in the past (normally
An = 1). This scheme is a generalization of Eq. (1) and it
Foreman, QCIF, SKIP=2, ORIG. REF.
includes concepts like subpixel accurate MCP [43, 441,
B-frames [4S], spatial filtering [4], and OBMC [ 16-19].
Using the linear filtering approach ofEq. ( l o ) ,the accu-
racy of motion compensation can be significantly im-
proved. A rationale for this approach is that if there are P
different plausible hypotheses for the MV that properly
represents the motion ofa pixel s, and if each of these can
be associated with a hypothesis probability h,)(s), then
the expected value of the pixel prediction is given by Eq.
(10)-and an expected value is the estimator that mini-
mizes the mean-square error in the prediction of any ran-
dom variable. Another rationale is that if each hypothesis
prediction is viewed as a noisy representation of the pixel,
then performing an optimized weighted averaging of the
0 5 10 15 20 25 30 35 40
results of several hypotheses as performed in Eq. (10) can
Bit Rate [kbps]
reduce the noise. It should be obvious that ifan optimized
'I=,
set ofweights {ha(s)1 is used in the linear combination A 3. Prediction gain vs. MV bit rate for the sequences Mother &
(Eq. ( l o ) ) ,the result cannot be worse on average than the Daughter (top) and Foreman (bottom) when employing H.263
result obtained from a single hypothesis as in Eq. (1).The MV median prediction and original frames as reference
multi-hypothesis MCP concept was introduced in [ 181, frames.

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 83


~

case, the portion of the bit rate used for sending MVs MV values from those ofncighhoring blocks and OKMC.
shows an increase of 30% compared to the one-frame In [52, 531, Wiegand et al. proposed the exploitation of
case [49]. Embedded in a complete video coder, the ap- mode decision dependencies betnwn macroblocks using
proach still yields significant coding gains expressed in dynamic programming methods. Later work o n the sub-
bit-rate savings of 23% for the sequence Fo~cmanand ject that also included the option to change the quantizer
17% for the SeqLIellCC Mot/Jcr & ' 1 h i W J / J f f Y dLlc tO the im-
pact of long-term memory MCP when comparing it to Mother-Daughter, QCIF, SKIP=2, Q=4,5,7,10,15,25
40 I I
the rate-distortion optimized H.263 coder, which is out-
lined in this article [49].
Complex inotion inodels (the third item) have been 38 - ..................... .i.. .................

proposed by ,I great number of researchers for improving


motion compensation I-"+bmance. The main effect of 36 - ............
using a higher-order approximation of the displacement
\'ector field (e.g., using polynomial motion models) is in- ..............j ...................... ;
creased accuracy relative to what is achievable with
t r a t i s 1:it i o t i a 1 i n o t i o n m o d e Is t h a t re 1 a t e t o ..............:... ..................:.................
pieccu.ise-cotistant approximation. In [ S O ] and [ S 11, a
complete video codec is presented, where image seg-
ments are motion compensated using bilinear ( 12 param-
eter) motion models. The image segments partition a
I
....i/* .... ..........I1 :
:
-
-
0
Annexes D+F 16x16 blocks
Annexes D+F, 8x0 blocks
Annexes D+F 16x16 and 8x0 blocks
Ii
I

28
video frame do\vn to a granularity of8 x 8 blocks. Kit-rate 0 20 40 60 80
savings of more than 25% were reported for the sequence Bit Rate [kbps]
Foveman S 11.

Foreman, QCIF, SKIP=2, 0=4,5,7,10,15,25


INTRA/INTER/SKIP Mode Decision 40
Hybrid video coding consists of the motion estimation
and the residual coding stages, and an interface between ...................... j ...................... j ....................... i......................
them consisting of prediction mode decision. The task
for the residual coding is t o represent signal parts that
are not sufficiently approximated by the earlier stages.
From the viewpoint of bit-allocation strategies, the vari-
ous prediction modes relate to various bit-rate parti-
tions. Considering the various H.263 modes: INTRA,
SKIP, INTER, and INTEl<+4V, Table 1 gives typical values
for the bit-rate partition of motion and D F D texture
coding for typical sequences. The \vtrious modes in Ta-
ble l relate to quite different overall bit rates. Since the
choice of mode is adapted to the scene content, it is
t
28 .......
fl ' ....... i..~i -
j I
I
- .
Amixes D+F 16xl6b~ocks
Annexes D+F, 0x0 blocks
AnnexesD+F 16xl6and8x0blocks 1I
I 1
26
transmitted as side information. 0 50 100 150 200
Bit Rate [kbps]
Ifwe assume for the monient that the bit rate and dis- ~ ____ ~ ~ ~ _ _ _ ~_ _ _ _ _ _ _ ~

tortion ofthe residual coding stage is controlled by the se- A 4. Coding performance for the sequences Mother 6: Daughter
lection of a quantizer step size Q, then rate-distortion (top) and Foreman (bottom) when employing variable block
optimized mode decision refers to the minimization of sizes.
the following Lagrangian functional
Table 1. Bit-rate partition of the
/ ( % M , Q ) =DRt(:(A,M , Q ) + ~ M>Qh
& , ( N ~ k ~ l , t ( P L
various H.263 modes.
(11) I I
where, for instance, M E { I N T R A , S K I P , I N T E l I , Mode Motion Coding Texture Coding
INTER+4\'} indicates a mode chosen for a particular Bit Rate ['%I Bit Rate [%I]
macroblock, Q is the selected quantizer step size,
Zll,b( ( A ,M , Q ) is the SSD between the original INTKA 0 100
macroblock A and its reconstruction, and K,,,,. ( A ,M , Q)
is the number of bits associated with choosing M and Q.
A simple algorithm for rate-constrained mode deci-
INTEK 3 0 f 15 70 iI S
sion minimizes Eq. (11)given all mode decisions of past
mncroblocl<s [26, 271. This procedure partially neglects
dependencies between macroblocks, such as prediction of

84 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


\ralue on a macroblocl<-to-macroblocl< basis appeared by quantizer somewhat toward lower bit-rate indices [24,
Schuster and ICatsaggelos [S4]. 25,551. This is the inethod used in the ITU-T test model
[ 331. Alternatively, a D + Wi decision can be made ex-
plicitly to choose the quantization index. However, in
Quantization modern video coders such as H.263 the bit rate needed to
After DCT transformation, the residual signal must be represent a given quantization index depends not only on
quantized t o form the final estimate. Ideallj7, the choice of the index chosen for a particular sample, but on the values
quantizer step size Q should be optimized in a ofneighboring quantized indices as well (due to the struc-
rate-distortion sense. Given a quantizer step size Q, the ture of the coefikieiit index entropy coding inethod
quantization of the residual signal (the mapping of the used). The best performance can be obtained by account-
transformed samples .to quantization index values) ing for these interactions [29]. In recent video coder de-
should also be rate-distortion optimized. The choice of signs, the interactions have become complex, such that a
the quantizer output l e ~ e lsent for a given input value trellis-based quantization technique may be justified.
should balance the needs of rate and distortion. A simple Such a quantization scheme upas proposed by Ortega and
way to do this is to move the decision thresholds of the Ramchandran [S6], and a version that handles the more

I Foreman, QCIF, SKIP=:! Mobile-Calendar, QCIF, SKIP=2


1 I
h = lO0Ot
h=25
0.8

8
g 0.6
3

8
c
-
d
9 0.4
._
I h=100

0.2

0
1 5 10 15 20 25 31
QUANT

(b)

Mother-Daughter, QCIF, SKIP=2 News, QCIF, SKIP=2

h = 10001 I

8
0.8
-------
h=25

0.6
3

8
9 0.4
.-
1
-
W
U
0.2

A 5. Relative occurrence vs. macroblock QUANT for various Lagrange parameter settings. The relative occurrences of macroblock QUAN7
values are gathered while coding 100 frames of the video sequences Foreman (a), Mobile & Calendar (b), Mother & Daughter (c),
and News (d).

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 85


~

complex structure of the entropy coding ofH.263 has re- Lagrange Parameter vs. Average Quant
cently appeared [57, 581. Trellis-based quantization was 1000
reported to provide approximately a 3% reduction in the
bit rate needed for a given level of fidelity when applied to 800
H.263-based DCT coding [57, 581. zl
r

E
600
a"
Choosing h and the Quantization Step Size Q al
The algorithm for the rate-constrained mode decision can
be modified in order to incorporate macroblock
5P 400

quantization step-size changes. For that, the set o f


macroblock modes to choose from can be extended by 200
also including the prediction mode type INTER+Q for
each macroblock, which permits changing Q by a small 0
amount when sending an INTER macroblock. More pre- 5 10 15 20 25
QUANT
cisely, for each macroblock a mode M can be chosen from
the set A 6. Language parameter AMODEvs. average macroblock QUANT.
M E{rNTRA, SKlP,INTER,INTER+4V, ...INTER+Q(-4) ,
INTER+Q(-2), INTEK+Q(+2),INTEK+Q(4) }
As a further justification of our simple approximation
(12) of the relationship between h MOI)l and Q, let us assume a
typical quantization curve high-rate approximation [59,
where, for example, INTEK+Q(-2) stands for the INTER
601 as follows
mode being coded with quantizer step size reduced by
two relative to the previous macroblock. Hence, the
macroblock Q selected by the minimization routine be-
comes dependent o n h Otherwise the algorithm for
running the rate-distortion optimized coder remains un-
where a is a constant that depends on the source pdf. The
changed.
minimization (Eq. (8)) for a given value of hhlO1)k can
Figure 5 shows the relative occurrence of macroblock then be accomplished by setting the derivative of with
QUANT values (as QUANT is defined in H.263, Q is 2
respect to D equal to zero. This is equivalent to setting the
QUANT) for several Lagrange parameter settings. The
derivative ofR(1)) with respect tol) equal to -1 / h ,
Lagrange parameter h is varied over seven values: 4, which yields
25, 100,250,400,730, 1000, producing seven normal-
ized histograms that are depicted in the plots in Fig. 5. In
Fig. 5, the macroblock QUANT values are gathered while
coding 100 frames of the video sequences Foreman, Mo-
bile 8 Calendar, Mother 0Daughter, and News.
At sufficiently high rates, a reasonably well-behaved
Figure 6 shows the obtained average macroblock source probability distribution can be approximated as a
QUANT gathered when coding the complete sequences
constant within each quantization interval [60]. This
Foreman, Mobile 0 Calendar, Mother 0DauJhter, and leads readily to the typical high bit-rate approximation
New. The red curve relates to the function L) (2 . QUANT) * / 12 . The approximations then yield

which is an approximation of the hnctional relationship where c = 4 / (12a). Although our assumptions may not
between the macroblock QUANT and the Lagrange pa- be completely realistic, the derivation reveals at least the
rameter h up to QUANT values of 25, and H.263 al- qualitative insight that it may be reasonablc for the value
lows only a choice of QUANT E {1,2, ...,31}. Particularly of the Lagrange parameter h h , O l ) ~ to be proportional to
remarkable is the strong dependency between h Mol)t and the square of the quantization parameter. As shown
QUANT, even for sequences with widely varying content. above, 0.85 appears to be a reasonable value for use as the
Note, however, that for a given value of h X,oI)E, the cho- constant c.
sen QUANT tends to be higher for sequences that require This ties together two of the three optimization pa-
higher amounts of bits (Mobile 0Calendar) in compari- rameters, QUANT and h MOI)E . For the third, h h,O,,ON ,we
son to sequences requiring smaller amounts of bits for make an adjustment to the relationship to allow use of the
coding at that particular h,,,,, (Mother 0 Dau& SAL) measure rather than the SSD measure in that stage
ter)-but these differences are rather small. of encoding. Experimentally, we have found that an effec-

86 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


tive such method is to measure distortion during motion 100 to bias the decision toward choosing the SKIP mode.
estimation using SAD and to simply adjust h for the lack If the INTEli mode is chosen (i.e., if the inequality above is
of the squaring operation in the error computation, as not satisfied) the chosen integer-pixel M V is half-pixel re-
given by fined. The MVs for INTER+4V blocks were found by
half-pixel refining the integer-pixel M V of the 16 x 16
A I 0 I ION = Jh,,,. (17) blocks. Finally, the INTER+4V mode was chosen if

This strong dependency that we have thus derived be-


min{ SAD, (halfpixe1,S x 8))
tween QUANT, h Mol)t. ,and h M(I,ION offers a simple treat- l=O
ment ofeach ofthese quantities as a dependent variable of < min{ SAD(halfpixel,l6 x 16)) - 200.
another. For example, the rate control method may adjust (19)
the macroblock QUANT occasionally so as to control the
was satisfied, where min{SADz(halfpixel, 8 x 8)) is the
average bit rate o f a video sequence, while treating h
minimum SAD value of the z’th of the four 8 x 8 blocks.
and h h,O,lOS as dependent variables using Eqs. (13) and
The SKIP mode was chosen in TMN-9 only if the INTER
( 17). In the experiments reported herein, we therefore
mode was chosen as better than the INTKA mode and the
used the approximation (17) with the SAD error measure
M V components and all of the quantized transform coef-
for motion estimation and the approximation (13) with
ficients were zero.
the SSD error measure for mode decisions.
In the TMN- 10 rate-distortion optimized strategy, the
motion search uses rate-constrained motion estimation
Comparison to Other Encoding Strategies i r first finding the best integer-pixel M V in the search

The I T U - T Video Coding Experts Group (ITU-T Mother-Daughter, QCIF, SKIP=2, Q=4,5,7,10,15,25
Q. 1S/SG16) maintains an internal document describing .- I I
examples of encoding strategies, which is called its test
model [33, 611. The mode decision and motion estima-
tion optimization strategies described above, along with
the method of choosing h MO1)E based on quantizer step
size as shown above were recently proposed by the second
author and others for inclusion into this test model [62,
631. The group, which is chaired by the first author, had
previously been using a less-optimized encoding ap-
proach for its internal evaluations [61], but accepted
these methods in the creation of a more recent model
[ 331. The test model documents, the other referenced
Q . l S documents, and other information relating to
281
I I
I
-
U - 4
Annexes D+F, TMN-10 MD and TMN-9ME
Annexes D+F, TMN-10 MD and ME
I
ITU-T Video Coding Experts Group work can be found 0 20 40 60 80
o n an ftp site maintained by the group (ftp://stan- Bit Rate [kbps]
dard.pictel.com/video-site). Reference software for the
test model is available by ftp from the University of Brit-
ish Columbia (ftp://dspftp.ee.ubc.ca/pub/tmn,with fur- Foreman, QCIF, SKIP=2, Q=4,5,7,10,15.25
40 I I I
t h e r i n f o r m a t i o n a t h t t p :/ / w w w . ece. u b c . ca/
spnig/research/motic,n/h263plus). _
38 ...................... 1 ....................... :....................... ...................... -
The less-sophisticated TMN-9 mode-decision method
is based on thresholds. It compared the sum of absolute 36 ...................... ..................... ...
differences of the 16 x 16 macroblock ( W )with respect
to its mean value to the minimum prediction SAD ob-
tained by an integer-pixel motion search in order to make
its decision between INTRA and INTER modes according .................I$
..................... i....................... ...................... 4
to whether

/k( IY o Annexes DcF, TMN-9 MD and ME


D Annexes D+F TMN-10 MD andTMN-9ME
When this inequality was satisfied, the INT1U mode + -C Annexes D+F, TMN-10 MD and ME

26
would be chosen for that particular macroblock. The 0 50 100 150 200
min{SAD(fullpixel,l6 x 16))value above corresponds to Bit Rate [kbps]
the minimum SAL> value after integer-pixel motion com- A 7. Coding performance for the sequences Mother & Daughter
pensation using a 16 x 16 motion compensation block (top) and Foreman (bottom) when comparing the TMN-9 to
size, where the SAD value of the (0,O) M V is reduced by the TMN- 10 encoding strqtegy

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 87


range of f 1 5 pixels. Then, the best integer-pixel M V is Case 2 has been included t o demonstrate the impact of
half-pixel refined by again minimizing the Lagrangian rate-constrained mode decision and motion estimation
cost functional for motion estimation given in Eq. (9). separately. Comparing the three cases, we find that the
This procedure is executed for both 16 x 16 and 8 x 8 usage of the full-motion estimation search range of k15
blocks. The mode decision of TMN- 10 is then conducted pixels for the 8 x 8 block displacement vectors in
using the rate-distortion optimized niethod described in INTEK+4V mode provides most of the gain for the
this article. TMN-10 encoding strategy. The INTER+4V prediction
The role of the encoding strategy is demonstrated in mode is very seldom used in TMN-9, indicating that the
Fig. 7 for the video sequences Mothev C??Daqhtev and TMN-9 motion estimation and mode decision rules basi-
Foveman. The same syntax (H.263 using annexes D and cally fail t o make effective use of this mode. in the highly
F) was used throughout, with changes only in the mode
active Foreman sequence, TMN-10 (Case 3) uses this
decision and motion estimation coding methods. These
mode for about 15% of macroblocks, whereas TMN-9
changes are:
A Case 1: TMN-9 inode decision and TMN-9 motion es-
(Case 1) uses it for only about 2%.
timation The TMN-9 motion estimation strategy only permits
A Case 2: TMN-10 mode decision and TMN-9 motion the use of half-pixel positions for the 8 x 8 block displace-
estimation ment vectors that immediately surround the previously se-
A Case 3: TMN- 10 mode decision and TMN- 10 motion lected 16 x 16block displacement vector that is searched in
estimation a f 1 5 range. We have observed that using the full search
range for the 8 x 8 block displacement vectors leads to irn-
proved coding performance for the rate-constrained mo-
Mother-Daughter, QCIF, SKIP=2, Q=4,5,7,10,15,25
tion estimation, whereas for the T M N - 9 motion
I
estimation, using the full search for this small block size
-
L~

MOTION Annexes D+F TMN-9 MD and ME


MOTION Annexes D+F TMN-10 MD andTMN-9ME would actually harm the TMN-9 results, since no rate co~i-
MOTION Annexes D+F TMN-10 MD and ME
c-*
straint was employed in its search. Only adding a rate con-
straint to the motion estimation can allow the INTER+4\’
mode to perform with its full potential.
Figure 8 shows that the TMN-10 coder uses about
twice as many bits for motion than the other two coders
in order to obtain a better prediction so it can use less dif-
ference coding and still obtain an improvement in the
overall performance. This is partly because of more fre-
quent use of the INTEK+4V mode and partly because of
the larger motion estimation search range considered for
I the 8 x 8 blocks when the INTEK+4V mode is chosen.
20 40 60 80
Bit Rate [kbps]
In the TMN-10 strategy, the bit rate allocated to the
motion part ofthe infortnation increases as the overall bit
. Foreman, QCIF, SKIP=2, Q=4,5,7,10,15,25
I I rate increases, which nialces intuitive sense. The TMN-9
motion estimation shows completely different and some-
times counterintuitive behavior. For the sequence Fove-
tnan, the MV bit rate actually decreases as overall bit rate
increases. This results from the fact that the TMN-9 tno-
tion estimation does not employ a rate constraint and that
motion estimation is performed using the reconstructed
frames (for TMN-9 as well as for TMN-10). As bit rate
decreases, these reconstructed frames get noisier, and,
since the regularization by the rate constraint is missing
for the TMN-9 motion estimation, the estimates for the
-
-
M MOTION, Annexes D+F, TMN-9 MD and ME
MOTION, Annexes D+F, TMN-10 MD and TMN-9ME
MOTION, Annexes D+F TMN-10 MD and ME
motion data get noisier and require a higher bit rate.
Rate-constrained mode decision, as employed in
50 100 150 200 TMN- 10, provides rather minor gains, but is conceptu-
Bit Rate [kbps] ally simple and introduces a reasonably sinall computa-
A 8. Bit-rate partition of motion vectors vs. bit rate for the se- tional overhead for some implementations. The overall
quences Mother & Daughter (top) and Foreman (bottom) p e r h m a n c e gain of the iinproved niode decision and
when employing TMN- 10 mode decision and motion estima- niotion-estimation methods is typically around 10% i n
tion. bit rate, or 0.5 dB in PSNR.

88 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998


11. I<. Rijltse, “H.263: video coding for low-bit-rate communication,” IEEE
Conclusions Commun.Mag., vol. 34, no. 12, pp. 42-45, Dec. 1996.
We have described the structure of typical video coders 12. B. Girod, E. Steinbach, and N. Farber, “Performance of the H.263 video
and showed that their design and operation requires a compression standard,” Journal of VLSI Sgnal Processing:Systemsfor S&nal,
keen understanding and analysis of the trade-offs be- Image, and Video Technology, 1997.
tween bit rate and distortion. The single powerful princi- 13. B. Erol, M. Gallant, G. Cad, and E. Kossentiiii, “The H.263+ video
ple o f D + hR Lagrange multiplier optimization [22] has coding standard: complexity and performance”, in Proceedingsofthe IEEE
emerged as the weapon of choice in the optimization of Data Conzpyession Conference, Snowbird, Utah, USA, pp. 259-268, Mar.
1998.
such systems and can provide significant benefits if judi-
ciously applied. 14. A. Ortega and K. Ramchandran, “Rate-distortion methods for image and
video Compression: An Overview,” this issue, pp. 23-50.
15. G.M. Schuster, G. Melnikov, and A.K. Katsaggelos, “Operationally ver-
Acknowledgments tex-based shape coding,” this issue, pp. 91-108.

The authors wish to thank Klaus Stuhlmdler, Nilco 16. H . Watanabe and S. Singhal, “Windowed motion compensation,” in Pro-
ceedings ofthe SPIE Conferncc on Visual Communicationsand Image Pro-
Farber, Bernd Girod, Barry Andrews, Philip Chou, and
cessing, vol. 1605, pp. 582-589, 1991.
the PictureTel research team for their support and useful
discussions. They also wish to thank guest editors Anto- 17. S. Nogalci and M. Ohta, “An overlapped block motion compensation for
high quality motion picture coding”, in Proceedingsofthe IEEE international
nio Ortega and Kannan Ramchandran, as well as Jona- Synzposinn? on Circuitsand Systems, vol. 1,pp. 184-187, May 1992.
than Su, John Villasenor and his UCLA research team,
18. G.J. Sullivan, “Multi-hypothesis motion compensation for low bit-rate
Faouzi Kossentini and his UBC research team includng
video coding,” in Proceedingsofthe iEEE Internatwnal Confrence on Acous-
Michael Gallant, and the anonymous reviewers for their tics, Speech, and Sgnal Processing,Minneapolis, MN, USA, vol. 5, pp.
valuable comments. The work of Thomas Wiegand was 437-440, April, 1993.
partially funded by 8x8, Inc. 19. M.T. Orchard and G.J. Sullivan, “Overlapped block motion compensation:
an estimation-theoretic approach,” iEEE Transactionson Image Processing,
G a ~ Sullivan
y is the manager of communication core re- vol. 3, no. 5, pp. 693-699, Sept. 1994.
search with PictureTel Corporation in Andover, 20. K. Ramchandran, A. Ortega, and M. Vetterli, “Bit allocation for dependent
Massachussetts, USA. Thomus We&und is a Ph.D. quantization with applications to multiresolution and MPEG video coders”,
student with the University of Erlangen-Nuremberg in IEEE Transactions on Image Processing, vol. 3, no. 5, pp. 533-545, Sept.
1994.
Erlangen, Germany.
21. J. Lee and B.W. Dicltinson, “Joint optimization of frame type selection and
bit allocation for MPEG video coders,” in Proceedings of the iEEE interna-
References tional Confrence on I m a p Pvocessing, Austin, USA, vol. 2, pp. 962-966,
Nov. 1994.
1.ITU-T (formerly CCITT) and ISO/IEC JTC1, “Digital Compression and
22. Y. Shoham and A. Gersho, “Efficient bit allocation for an arbitrary set of
Coding of Continuous-Tone Still Images,” ISO/IEC 10918-1 - ITU-T
quantizers,” IEEE Transactzons on Aconstics, Speech and Signal Pf,ocessing, vol.
Recommendation T.81 (JPEG), Sept. 1992.
36, .. 1445-1453, Sept. 1988.
pp.
2. W.B. Pennebaker and J.L. Mitchell, JPEG: Still Image Data Compression
23. H. Everett 111, “Generalized Lagrange multiplier method for solving prob-
Standard, Van Nostrand Reinhold, New York, USA, 1993.
lems of optimum allocation of resources,” Operations Research, vol. 11, pp.
3. ITU-T (formerly CCITT), “Codec for Videoconferencing Using Primary 399-417, 1963.
Digital Group Transmission,” ITU-T Recommendation H.120; version 1,
24. P.A. Chou, T. Lookabaugh, and R.M. Gray, “Entropy-constrained vector
1984; version 2, 1988.
quantization,” IEEE Transactions on Acoustics, Speech and Signal ProcessinE.
4. ITU-T (formerly CCITT), “Video codec for audiovisual services at p x 64 vol. 37, no. 1, pp. 31-42, Jan. 1989.
kbitis,” ITU-T Recommendation H.261; version 1, Nov. 1990; version 2,
25. A. Gersho and R.M. Gray, VectorQuantizatwnand Sgnal Compresszon,
Mar. 1993.
IUuwer Academic Publishers. Boston. USA. 1991.
I /

5. M.L. Liou, “Overview of the p x 64 kbps video coding standard,” Commwni-


26. T. Watanabe, Y. Tsukuhara, and K. Ohzcki, “Ratc-adaptive DCT coding
cationsof theACM, vol. 34, pp.47-58, Apr.1991.
for color picture,” in l’roccedings ofthe Picture Coding Symposium, Boston,
6. ISO/IEC JTCI, “Coding of moving pictures and associated audio for digital MA, paper no. 3.13, Mar. 1990.
storage media at up to about 1.5Mbit/s - Part 2: Video,” ISO/IEC
27. S.-W. Wu and A. Gersho, “Rate-constrained optimal block-adaptive coding
11172-2 (MPEG-I), Mar. 1993.
for digital tape recording of HDTV”, IEEE Transactions on Circuits and Syr-
7. J.L. Mitchell, W.B. Pennebaker, C. Fogg, and D.J. LeGall, iMPEG Video ternsfor Video Technology, pp, 100-112, vol. 1,no. 1,Mar. 1991
ComprerswnStandard, Chapman and Hall, New York, USA, 1997.
28. S.-W. Wu and A. Gersho, “Enhanced video compression with standardized
8. ITL-T (formerly CCITT) and ISO/IEC JTC1, “Generic coding of moving bit stream syntax,” in Proceedingsof the IEEE International Confrence on
pictures and associated audio information - Part 2: Video,” ITU-T Recom- Acoustics, Speech and S&al Processing,Minneapolis, MN, USA, vol. 1,pp.
mendation H.262 - ISO/IEC 13818-2 (MPEG-2), Nov. 1994. 103.106, Apr. 1993.
9. B.G. Haslcell, A. Pnri, A.N. Netravalli, Dgital Edeo: An Introduction t o 29. S:W. Wu, Enhanced Image and Video Compression with Conswaintson the Bit
AQEG-2, Chapman and Hall, New York, USA, 1997. Stream Fomnat, Ph.D. thesis, U. C., Santa Barbara, Mar. 1993.
10. ITU-T (formerly CCITT), “Video coding for low bitrate communication,” 30. G.J. Sullivan and R.L. Baker, “Effkient quadtrer coding of imagcs and
ITU-T Recommendation H.263; version 1,Nov. 1995; version 2, Jan. video,” in Proceedingsof the IEEE International Conference on Acoustics, Speech
1998. and Signal Processing,Toronto, Canada, pp. 2661-2664, May 1991.

NOVEMBER 1998 IEEE SIGNAL PROCESSING MAGAZINE 89


31. G.J. Sullivan and R.L. Baker, “Rate-distortion optimized motion compen- 47 M Fherl, T Wiegand, and B Girod, “A local opumal design algorithm for
sation for video compression using fixed or variable size blocks,” in Ghbal block-based multi-hypothesismouon-compensated prediction,” in Proc
Telecomm. Conf (GLOBECOM’9I), pp. 85-90, Dec. 1991. DCC, Snowbird, USA, Mar 1998
32. B.Girod, “Rate-constrainedmotion estimation,” in Proceedingsofthe SPIE 45 T Wiegand, M Flierl, and B Girod, “Entropy-constrainedh e a r vector
Conference on VisualCommunications and Image Processing, Chicago, USA, predicuon for mouon-compensated video coding”, in Proc ISIT, Boston,
pp. 1026.1034, Sept. 1994. USA,Aug 1998

33. ITU-T SG16/Q15 (T. Gardos, ed.), ‘Video codec test model number 10 49 T Wiegand, X Zhang, and B Girod, “Long-teim memory mo-
(TMN-10); ITU-T SG16/Q15 document Q15-D-65, (downloadablevia tion compensated prediction,)’ IEEE Tramactzonson Czrcuzts and Systemsfor
Apr. 1998.
ftp://standard.pictel.com/video-site), Video Technology, Sept 1998

34. J. Choi and D. Park, “A stable feeedback control of the buffer state using 50 Noha Research Center (P Haavisto, et a1 ), “Pioposal for
the controlled lagrange multiplier method,” LEEE Transactions on Image Pro- ing,” ISO/EC JTCl/SC29/wG11, MPEG document MPEG96/M0904 ,
cessing,vol. 3, no. 5, pp. 546-558, Sept. 1994. July 1996

35. A. Ortega, I<. Ramchandran, and M. Vetterli, “Optimal trellis-based 51. M Karczewicz, J Nieweglowslu, aid P Haavisto, “V
buffered compression and fast approximations,” IEEE Transactions on Image mouon compensation with polynomial mouon vector
c e s s z n ~Imap Communzcatzon, vol 10, pp 63-91, 1997
Processing,vol. 3, no. 1, pp. 26-40, Jan. 1994.
52 T Wiegand, M Lightstone, T G Campbell, and
36. J. Ribas-Corbera and S. Lei, “Rate control for low-delay video communica-
mode selecuon for block-based mouon compensated mdeo coding”, in Pro-
tions,” ITU-T SG16/Q15 document Q15-A-20, (downloadablevia
ceedzngs of the IEEE Internatzoflal Confmence on Image Processzng, Washmng-
June 1997.
ftp://standard.pictel.com/video-site),
ton, D C , USA, Oct 1995
37. M.C. Chen and A.N. Wilkon, “Rate-distortion optimal motion estimation
53 T Wiegand, M Lightstone, D Mulcherlee, T G Campbell, and S I<
for video coding,” Proceedingsof the IEEE International Confmence on Acous-
Mitra, “Rate distoruon optimized mode selecuon for very low bit rate
tics, Speech, and Shnal Processing, Atlanta, USA, vol. 4, pp. 2096-2099, May
video codmg and the emerging H 263 standard,” B E E Transactzons on Czr-
1996.
cuzts and SyaemsforVideo Technology, vol 6 , no 2, pp 182-190, Apr 1996
38. M.C. Chen and A.N. Willson, “Design and optimization of a differentially
54 G M Schuster and A I< Katsaggelos, “Fast and efficient mode and
coded variable block size motion compensation system,” Proceedingsof the
quanuzer selection m the rate dstoruon sense for H 263,” in Proceedzngs of
IEEE International Confeeunce on Image Processing, Laussanne, Switzerland,
the SPIB Cony%renceon Vzsual Communzcatwnsand Image Processzng, Or-
vol. 3, pp. 259-262, Sept. 1996.
lando, USA, pp 784-795, Mar 1996
39. M.C. Chen and A.N. Willson, “Rate-distortion optimal motion estimation 55 G.J. Suhvan, “Efficient scalar quantizauon of exponential and Laplacian
algorithms for motion-compensatedtransform video coding,” B E E Trans- random variables,” IEEE Transactwnson Infomatson Theory, vol 42, no 5,
actzons on Circuztsand Systemsfor Video Technology, vol. 8, no. 2, pp. pp 1365-1374,Sept 1996
147.158, Apr. 1998.
56. A Ortega and I< Ramchandran, “Forward-adaptivequail
40. W.C. Chung, F. Kossentiui, and M.J.T. Smith, “An efficient motion esti- tmal overhead cost for image and video coding with apphcauons to mpeg
mation technique based on a rate-distortion criterion,” in Proceedingsofthe mdeo coders,” in Proceedzngs of the SHE, Dzgztal Video Compresswn Algo-
IEEE International Confrence on Acoustics, Speech and Sgnal process in^, At- rzthms and Technohgzes, San Jose, USA, Feb 1995
lanta, USA, vol. 4, pp. 1926-1929,May 1996.
57 J Wen, M Luttrell, and J Villasenor, “Simulauon results o
41. F. Kossentini, Y.-W. Lee, M.J.T. Smith, and R. Ward, “Predictive RD op- adapuve quanuzauon,” ITU-T SG16/Ql5 document Q15-D-40, (down-
timized motion estimation for very low bit rate video coding,” IEEEJoumaL loadable via ftp //standard pictel com/mdeo-site),Apr 1998.
on SelectedAreas in Conzmunications, vol. 15, no. 9, pp. 1752-1763,Dec.
58 J Wen, M. Luttrell, and J.Villasenor, “Trellis-Based R-D Opumal
1997.
Quanuzauon in H 263+,” IEEE Tmnsactzons on Image Pvocesszng,1998,
42. G.M. Schuster and A.K. Katsaggelos, “A video compression scheme with submitted for publication
optimal bit allocation among segmentation, motion, and residual error,” 59 N S Jayant and P. Noll, Dzgztal Codzna of Wavefoms, Prenuce-Hall,
LEEE Transactions on Image Processing,vol. 6, pp. 1487-1502,Nov. 1997. Englewood Chffs, USA, 1984
43. B.Girod, ‘“Theefficiency of motion-compensatingprediction for hybrid 60 H Gish and J N Pierce, “Asymptotically efficient quantizing”,IEEE Tnzns-
coding of video sequences”, IEEEJonrnal on SebctedAreas in Communica actwns on Injhnatzon Theory, vol 14, pp 676-683, Sept 1968
tions, vol. 5, no. 7, pp. 1140.1154, Aug. 1987.
61 I R - T SG16/Q15 (T Gardos, ed ), “Video codec test model number 9
44. B. Girod, “Motion-compensatingprediction with fractional-pel accuracy,” (TMN-9); ITU-T SG16/Q15 document Q15-(2-15, (downloadablema
IEEE Transactions on Communications,vol. 41, no. 4, pp. 604-612, Apr. ftp //standard pictel.com/video-site),Dec 1997
1993.
62 T. Wiegand and B D Andrews, “An improved H 263 coder using
45. H.G. Musmann, P.Pirsch, and H:J. Grallert, “Advances in picture cod- rate-dstoruon optimization,” ITU-T SG16/Q15 document Q15-D-13,
ing,” Proceedingsof rhe IEEE, vol. 73, no. 9, pp. 523-548, Apr. 1985. (downloadablema ftp //standard pictel comivideo-site),Apr 1998
46. B. Girod, ‘%fficiency analysis of multi-hypothesismotion-compensated pre- 63. M. Gallant, G Cott, and F Kossennni, “Description of and results for
diction for video coding,” IEEE Transactions on Image Processing, 1997, rate-distortion-basedcoder,” ITU-T SG16/Q15 document Q15-D-49,
submitted for publication. (downloadablevia ftp //standard pictel com/video-site),Apr 1998

90 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1998

You might also like