Frames, Fields, Pictures (I, P, B)
MPEG 2 encodes video as a series of pictures. For interlaced sequences the 2 fields of a frame may be encoded together as a frame picture. Alternatively they may be encoded separately as 2 field pictures. Both frame pictures and field pictures may be used together in a single interlaced sequence. High detail and limited motion favors frame picture encoding. Field pictures always occur in pairs (top - bottom or bottom - top). The output of the decoding process for an interlaced sequence is a series of reconstructed fields. For progressive sequences, all pictures are frame pictures. The output of the decoding process for a progressive sequence is a series of reconstructed frames. Encoded pictures are classified into 3 types: I, P, and B. I Pictures Intra Coded Pictures All macroblocks coded without prediction Needed to allow receiver to have a "starting point" for prediction after a channel change and to recover from errors P Pictures Predicted Pictures Macroblocks may be coded with forward prediction from references made from previous I and P pictures or may be intra coded B Pictures Bi-directionally predicted pictures Macroblocks may be coded with forward prediction from previous I or P references Macroblocks may be coded with backward prediction from next I or P reference Macroblocks may be coded with interpolated prediction from past and future I or P references Macroblocks may be intra coded (no prediction) Note that in P and B pictures, macroblocks may be skipped and not sent at all. The decoder then uses the anchor reference pictures for prediction with no error. B pictures are never used as prediction references.
Frame/picture/block types
MPEG-1 has several frame/picture types that serve different purposes. The most important, yet simplest, is I-frame. I-frames I-frame is an abbreviation for Intra-frame, so-called because they can be decoded independently of any other frames. They may also be known as I-pictures, or keyframes due to their somewhat similar function to the key frames used in animation. I-frames can be considered effectively identical to baseline JPEG images.[10] High-speed seeking through an MPEG-1 video is only possible to the nearest I-frame. When cutting a video it is not possible to start playback of a segment of video before the first I-frame in the segment (at least not without computationally intensive re-encoding). For this reason, Iframe-only MPEG videos are used in editing applications. I-frame only compression is very fast, but produces very large file sizes: a factor of 3 (or more) larger than normally encoded MPEG-1 video, depending on how temporally complex a specific video is.[2] I-frame only MPEG-1 video is very similar to MJPEG video. So much so that very high-speed and theoretically lossless (in reality, there are rounding errors) conversion can be made from one format to the other, provided a couple of restrictions (color space and quantization matrix) are followed in the creation of the bitstream.[44] The length between I-frames is known as the group of pictures (GOP) size. MPEG-1 most commonly uses a GOP size of 15-18. i.e. 1 I-frame for every 14-17 non-I-frames (some combination of P- and B- frames). With more intelligent encoders, GOP size is dynamically chosen, up to some pre-selected maximum limit.[10] Limits are placed on the maximum number of frames between I-frames due to decoding complexing, decoder buffer size, recovery time after data errors, seeking ability, and accumulation of IDCT errors in low-precision implementations most common in hardware decoders (See: IEEE-1180). P-frames P-frame is an abbreviation for Predicted-frame. They may also be called forward-predicted frames, or inter-frames (B-frames are also inter-frames). P-frames exist to improve compression by exploiting the temporal (over time) redundancy in a video. P-frames store only the difference in image from the frame (either an I-frame or P-frame) immediately preceding it (this reference frame is also called the anchor frame). The difference between a P-frame and its anchor frame is calculated using motion vectors on each macroblock of the frame (see below). Such motion vector data will be embedded in the Pframe for use by the decoder.
A P-frame can contain any number of intra-coded blocks, in addition to any forward-predicted blocks.[45] If a video drastically changes from one frame to the next (such as a cut), it is more efficient to encode it as an I-frame. B-frames B-frame stands for bidirectional-frame. They may also be known as backwards-predicted frames or B-pictures. B-frames are quite similar to P-frames, except they can make predictions using both the previous and future frames (i.e. two anchor frames). It is therefore necessary for the player to first decode the next I- or P- anchor frame sequentially after the B-frame, before the B-frame can be decoded and displayed. This means decoding Bframes requires larger data buffers and causes an increased delay on both decoding and during encoding. This also necessitates the decoding time stamps (DTS) feature in the container/system stream (see above). As such, B-frames have long been subject of much controversy, they are often avoided in videos, and are sometimes not fully supported by hardware decoders. No other frames are predicted from a B-frame. Because of this, a very low bitrate B-frame can be inserted, where needed, to help control the bitrate. If this was done with a P-frame, future Pframes would be predicted from it and would lower the quality of the entire sequence. However, similarly, the future P-frame must still encode all the changes between it and the previous I- or Panchor frame. B-frames can also be beneficial in videos where the background behind an object is being revealed over several frames, or in fading transitions, such as scene changes.[2][10] A B-frame can contain any number of intra-coded blocks and forward-predicted blocks, in addition to backwards-predicted, or bidirectionally predicted blocks.[10][45] D-frames MPEG-1 has a unique frame type not found in later video standards. D-frames or DC-pictures are independent images (intra-frames) that have been encoded using DC transform coefficients only (AC coefficients are removed when encoding D-framessee DCT below) and hence are very low quality. D-frames are never referenced by I-, P- or B- frames. D-frames are only used for fast previews of video, for instance when seeking through a video at high speed.[2] Given moderately higher-performance decoding equipment, fast preview can be accomplished by decoding I-frames instead of D-frames. This provides higher quality previews, since I-frames contain AC coefficients as well as DC coefficients. If the encoder can assume that rapid I-frame decoding capability is available in decoders, it can save bits by not sending D-frames (thus improving compression of the video content). For this reason, D-frames are seldom actually used in MPEG-1 video encoding, and the D-frame feature has not been included in any later video coding standards.