Basic Video Compression Techniques
Introduction to Video compression
1
The need for video compression
2
Introduction to Video Compression
• A video consists of a time-ordered sequence of frames,
i.e., images.
• An obvious solution to compress video by applying an
image compression algorithm to each frame, for instance
compressing each image frame as a JPEG image.
3
Introduction to Video Compression
• Consecutive frames in a video are similar — temporal
redundancy exists.
• Significantly higher compression rates can be achieved by
exploiting temporal redundancy.
4
Introduction to Video Compression
It Utilizes two basic compression techniques:
• Intraframe compression
• Occurs within individual frames
• Designed to minimize the duplication of data in each picture
(Spatial Redundancy)
• Interframe compression
• Compression between frames
• Designed to minimize data redundancy in successive pictures
(Tempora l redundancy)
5
Introduction to Video Compression
• Temporal redundancy arises when successive frames of
video display images of the same scene.
• It is common for the content of the scene to remain fixed
or to change only slightly between successive frames.
• Spatial redundancy occurs because parts of the picture
are often replicated (with minor changes) within a single
frame of video.
6
Temporal Redundancy
• Temporal redundancy is exploited so that not every
frame of the video needs to be coded independently as a
new image.
• It makes more sense to code only the changed
information from frame to frame rather than coding the
whole frame
7
With difference coding, only the first image (I-frame) is coded in its entirety. In the two
following images (P-frames), references are made to the first picture for the static elements, i.e.
the house. Only the moving parts, i.e. the running man, are coded using motion vectors, thus
reducing the amount of information that is sent and stored.
Temporal Redundancy
• Temporal redundancy can be better exploited by
Predictive coding based on previous frames.
• Predicting the motion of pixels and regions from frame to
frame, rather than predicting the frame as a whole.
• Compression proceeds by subtracting images.
The difference between the current frame and other
frame(s) in the sequence will be coded small values and
low entropy, good for compression.
8
Temporal Redundancy
9
Pixel motion prediction
• It can be done even better by searching
for just the right parts of the image to
subtract from the previous frame.
• Practically, the coherency from frame to
frame is better exploited by observing hat
t
groups of contiguous pixels, rather than
individual pixels, move together with the
same motion vector.
• Therefore, it makes more sense to predict
frame n+1 in the form o f regions or blocks
rather than individual pixels.
Pixel motion prediction
• The pixel Cn(x, y) shown in the frame n has moved to a new location
in frame n+1. Consequently, Cn+1(x, y) in frame n+1 is not the same as
Cn(x, y) but is offset by the motion vector (dx,dy).
11
• the small error difference e(x,y) = Cn+1(x, y) - Cn(x+dx, y+dy)
Video Compression with Motion Compensation
Steps of Video compression based on Motion
Compensation (MC):
• MC-based Prediction.
• Motion Estimation (motion vector search).
• Derivation of the prediction error, i.e., the difference.
12
Motion Compensation
• It is an algorithmic technique used to predict a frame in a
video, given the previous and/or future frames.
• Motion compensation describes a picture in terms of the
transformation of a reference picture to the current
picture.
• The reference picture may be previous in time or even
from the future.
13
Motion Compensation
How it works
• It exploits the fact that, often, for many frames of a
movie, the only difference between one frame and
another is the result of either the camera moving or an
object in the frame moving.
• In reference to a video file, this means much of the
information that represents one frame will be the same
as the information used in the next frame.
• Using motion compensation, a video stream will contain
some full (reference) frames; then the only information
stored for the frames in between would be the
information needed to transform the previous frame into 14
the next frame
Motion Compensation
• Each image is divided into macroblocks of size N x N.
• By default, N = 16 for luminance images. For
chrominance images, N = 8 if 4:2:0 chroma
subsampling is adopted.
15
Motion Compensation
Motion compensation is performed at the macroblock level.
• The current image frame is referred to a s Target Frame.
• A match is sought between the macroblock in the Target
Frame and the most similar macroblock in previous and/or
future frame(s) (referred to as Reference frame(s) ).
• The displacement of the reference macroblock to the target
macroblock is called a motion vector MV.
• Figure 5.1 shows the case o f forward prediction in which
the Reference frame is taken to be a previous frame.
16
Fig. 5.1: Macroblocks an Motion Vector in Video Compression.
• MV search is usually limited to a small immediate
neighborhood — both horizontal and vertical displacements
in the range [−p, p].
This makes a search window of size (2p + 1) x (2p + 1). 17
Size of Macroblocks
• Smaller macroblocks increase the number of blocks in the
target frame
a larger number of motion vectors to predict the target frame.
This requires more bits to compress motion vectors,
but smaller macroblocks tend to decrease the predicti on error.
• Large r macroblocks
fewe r motion vectors to compress,
but also tend to increase the prediction error. This is because
larger areas could possibly cover more than one moving region
within a large macro block. 18
Video compression standard H. 261
H.261: An earlier digital video compression standard, its
principle o f MC- based compression is retained in all
later video compression standards.
• The standard was designed for videophone, video
conferencing and other audiovisual services over ISDN.
• The video codec supports bit- rates of p x 4 6 kbps,
where p ranges from 1 to 30 (Hence also known as p *
64).
• Require that the delay of the video encoder be less
than 150 msec so that the video can be used for real -
time video conferencing. 19
H.261 Frame Sequence
Two types of image frames are defined:
• Intra-frames (I-frames) and Inter-frames (P-frames):
20
H.261 Frame Sequence
• I-frames:
• These are intra-frames coded where only spatial redundancy is used to
compress that frame.
• Are treated a s independent images (can be reconstructed without
es
any reference to other frames ).
• Transform coding method similar to JPEG is applied within each I-frame.
• This frame requires more bits for compression than predicted
frames their compression is not that high
21
H.261 Frame Sequence
• P-frames:
• P-frames are predictive coded (forward predictive coding method ),
exploiting temporal redundancy by comparing them with a
preceding reference frame
• Are not independent (it is impossible to reconstruct them without the
data of another frame (I or P))
• They contain the motion vectors and error signals
• P-frames need less space than the I-frames, because only the differences
are stored. However, the y are expensive to compute, but are
necessary for compression
• An important problem the encoder faces is when to stop predicting
using P-frames, and instead insert an I frame
• An I frame needs to be inserted where P frames cannot give much
compression
22
• This happens during scene transitions or scene changes, where the error
images are high.
Fig. 5.4: H.261 Frame Sequence.
• We typically have a group of pictures — one I-frame followed by
several P-frames — a group of pictures
• Number of P-frames followed by each I-frame determines the size of
GOP – can be fixed or dynamic. Why this can’t be too large? 23
H.261 Frame Sequence
• Temporal redundancy removal is includedn P- i
frame coding, wherea
s I- frame coding performs only
spatial redundancy removal.
• Lost P-Frames usually results in artifacts that are folded
into subsequent frames. If an artifact persists over time,
then the likely cause is a lost P-Frame.
• To avoid propagation of coding errors, an I- frame is
usually sent a couple of times in each second of the
video. 2
Intra-frame (I-frame) Coding
• Various lossless and lossy compression techniques use —
like JPEG.
• Compression contained only within the current frame
• Simpler coding – Not enough by itself for high
compression.
• Cant rely on intra frame coding alone not enough
compression.
• However, cant rely on inter frame differences across a
large number of frames
• So when Errors get too large: Start a new I-Frame
25
Intra-frame (I-frame) Coding
• Macroblocks are of size 16 x 16 pixels for the Y frame, and 8 x
8 for Cb and Cr frames, since 4:2:0 chroma subsampling is
employed. A macroblock consists of four Y, one Cb, and one Cr
8 x 8 blocks.
• For each 8 x 8 block a DCT transform is applied, the DCT
coefficients then go throu gh quantization zigzag scan and
entropy coding.
Block Transform Encoding (I-frame)
27
Inter-frame (P-frame) Predictive Coding
The H.261 P-frame coding scheme based on motion
compensation:
• For each macroblock in the Target frame, a motion
vector is allocated.
• After the prediction, a difference macroblock is
derived to measure the prediction error.
• Each of these 8 x 8 blocks go through DCT,
quantization, zigzag scan and entropy coding
procedures.
28
Inter-frame (P-frame) Predictive Coding
• The P-frame coding encodes the difference macroblock
(not the Target macroblock itself).
• Sometimes, a good match cannot be found, i.e., the
prediction error exceeds a certain acceptable level. The
MB itself is then encoded (treated as an Intra MB) and
in this case it is termed
a no
n- motion compensated MB.
(In order to minimize the number of expensive motion estimation
calculations, they are only ca lculated if the difference b setween two
blocks at the same position i higher than a threshold, otherwise the
whole block is transmitted)
• For a motion vector, the difference MVD is sent for
entropy coding
29
Fig. 10.6: H.261 P-frame Coding Based on Motion Compensation. 30
Video compression standard MPEG
• MPEG: Moving Pictures Experts Group, established in
1988 for the development of digital video.
• MPEG compression is essentially an attempt to overcome
some shortcomings of H . 261:
• H.261 only encodes video. MPEG-1 encodes video and
audio.
• H.261 only allows forward prediction. MPEG-1 has
forward and backward prediction (B-pictures).
• MPEG-1 was designed to allow a fast forward and backward
search and a synchronization of audio and video.
Motion Compensation in MPEG-1
As mentioned before, Motion Compensation (MC) based
video encoding in H. 61 2 works as follows:
• In Motion Estimation (ME), each macroblock (MB) of
the Target P-frame is assigned a best matching MB
from the previously coded I or P frame - prediction.
• prediction error: The difference between the MB and
its matching MB, sent to DCT and its subsequent
encoding steps.
• The prediction is from a previous frame — forward
prediction.
32
Motion Compensation in MPEG-1
• Sometimes, areas of the current frame can be better
predicted by the next future frame. This might happen
because objects or the camera moves, exposing areas not
seen in the past frames.
The MB containing part of a ball in the Target frame cannot find a good matching MB in
the previous frame because half of the ball was occluded by another object. A match
however can readily be obtained from the next frame. 33
The Need for a Bidirectional Search
The Problem here is that many macroblocks need information that is
not in the reference frame.
• Occlusion by objects affects differencing
• Difficult to track occluded objects etc.
• MPEG uses forward/backward interpolated prediction.
• Using both frames increases the correctness in prediction during
motion compensation.
• The past and future reference frames can themselves be coded
as an I or a P frame.
34
MPEG B-Frames
• The MPEG solution is to add a third frame type which is a
bidirectional frame, o r B-frame
35
MPEG B-Frames
• B-frames, also known a s bidirectionall y coded frames, are
intercoded and also explo it temporal redundancy .
• To predict a B-frame, th e previous or past frame and the
next or future frame are used.
• The coding of B frames i s more complex compared with I
or P-frames with the encoder having to make more decisions.
36
MPEG B-Frames
• To compute a matching macroblock, the encoder needs to
search for the best motion vector in the past reference
frame and also for the best motion vector in the future
reference frame. Two motion vectors are computed for
each macroblock.
• The macroblock gets coded in one of three modes:
• Forward predicted using only the past frame Backwar
• predictedd using only the future frame Interpolated, using
• both b averaging the twoy predicted
blocks
• The case corresponding to the best macroblock match
and yielding t he least entropy in the difference is chosen.
37
Backward Prediction Implications
B-frames also necessitate reordering frames during
transmission, which cause s delays:
• The order in which frames arrive at the encoder is known
as the display order . This is also the order in which the
frames need to be displayed at the decoder after decoding.
• B-frames induce a forward and backward dependency . The
encoder has to encode and send to the decoder both the
future and past reference frame s before coding and
transmitting the current B-frame.
• Because of the change in the order, all potential B-frames
need to be buffered while the encoder codes the future
reference frame, imposing the encoder to deal with buffering
and also causing a delay during transmission.
38
Backward Prediction Implications
Ex: Here, Backward prediction requires that the future
frames that are to be used for backward prediction be
Encoded and Transmitted first, I.e. out of order.
39
Fig 5.9: MPEG Frame Sequence.
Example encoding patterns:
Pattern 1: IPPPPPPPPPP
Dependency: I <---- P <---- P <---- P ...
• I- frame compressed independently
• First P-frame compressed usi ng I-frame
• Second P-frame compressed using firs t P-frame
• And so on...
40
Example encoding patterns:
Pattern 2: I BB P BB P BB P BB P BB P BB P
Dependency: I <---- B B ----> P <---- B B ----> P ...
• I- frame compressed independently
• First P-frame compressed usi ng I-frame
• B-frames between I-frame and firs t P-frame compressed
using I-frame and firs t P-frame
• Second P-frame compressed using first P-frame
• B-frames between firs t P-frame and secon d P-
frame compressed using t P-firs frame and dsecon P-
frame
• And so on...
41
Example encoding patterns:
Pattern 3: I BBB P BBB P BBB P BBB P
Dependency: I <---- B B B ----> P <---- B B B ----> P ...
• I- frame compressed independently
• First P-frame compressed usi ng I-frame
• B-frames between I-frame and firs t P-frame compressed
using I-frame and firs t P-frame
• Second P-frame compressed using first P-frame
• B-frames between firs t P-frame and secon d P-
frame compressed using t P-firs frame and dsecon
P-
frame
• And so on...
42
The quality of an MPEG-video
• The usage of the particular frame type defines t he quality and the
compression ratio of the compressed video.
• I- frames increase the quality (a
nd size), whereas the usage foB
- s
frames compresses better but also produce poorer quality.
• The distance between two I-frames can be seen as a measure for the
quality of an MPEG-video.
• No defined limit to the number of consecutive B frames that may be
used in a group of pictures,
• Optimal number is application dependent.
• Most broadcast quality applications however, have tended to us e
2 consecutive B frames (I,B,B,P,B,B,P,) as the ideal trade-off
between compression efficiency
nd a video quality.
43
MC-based B-frame coding idea (summary)
The MC-based B-frame coding idea is illustrated in Fig. 10.8:
• Each MB from a B-frame will have up t o t wo motion vectors
(MVs) (one from the forward and one from the backward
prediction).
• If matching in both directions is successful, then two MVs
will be sent and the two corresponding matchin g MBs
are averaged (indicated by ‘%’ in the figure) before
comparing to the Target MB for generating the prediction
error.
n
• If a acceptable match can be found in only one of the
reference frames, then only one MV and its corresponding
MB will be used from either the forward or backward
prediction. 44
• Fig 5.8: B-frame Coding Based o nBidirectional Motion 45
Compensation.