The MPEG Standard
MPEG-1 (1992) actually a video player
plays out audio/video streams same type of access as home VCR
MPEG-2 (1995) introduced for compression and transmission of digital TV signals
still limited interactivity
MPEG-4 (1999) is completely different
high level of interactivity
MPEG-7 (2002) for the description of metadata only
4/20/2012 1
MPEG-4
MPEG-4 addresses the need towards
Mixing of natural and synthetic audiovisual information High interactivity in the presentation of multimedia content Deployment of communication systems for realtime or broadcast delivery of coded data streams
A new approach for describing, coding and presenting a scene MPEG-4 combines different coding tools for
4/20/2012
Audio/video Synthetic objects and graphics
MPEG-4 Objects
The audio/video components of MPEG-4
Objects are coded, transmitted separately and composed at the decoder site They can exist independently Multiple objects can be grouped together to form complex objects Video and audio can be easily manipulated Permits choosing appropriate coding tools for audio, video and graphics objects
4/20/2012 3
MPEG-4 Object Based Coding
4/20/2012
MPEG-4 Coding
The scene is composed and rendered at the sender site video frames, audio are coded, multiplexed and transmitted tools for coding arbitrarily shaped objects At the receiver the stream is demultiplexed video and audio are decoded, composed, synchronized and presented as defined at the senders site
4/20/2012 5
Object Coding
Objects are described mathematically (e.g. by their positions)
similarly for audio and graphics objects an object need only be defined once the viewer can change their position transmit calculations to update the scene at the receiver this is a critical feature when the response has to be fast and bit-rate is limited
4/20/2012 6
Binary Format for Scenes (BIFS)
MPEG-4s language for describing and dynamically changing a scene Borrows concepts from VRML Both define representations of the same data VRML defines objects and actions in text BIFS code is binary (10-15 times shorter) Unlike VRML, MPEG-4 uses BIFS for realtime streaming: a scene can be built-up and played on the fly VRML and BIFS evolve consistently 4/20/2012 7
scene graph
4/20/2012
The Scene Graph
Represents a scene as independent or compound objects e.g.,
father and child the audio track of his voice floor and walls (sprites: for backgrounds) the web site the synthetic image of the furniture a synthetic HDTV set playing a movie from the families DVD library
9
4/20/2012
Elementary Streams (ES)
The scheme for preparing content for transmission, storage and decoding Objects are placed in ESs Probably two or more ESs per object A sound track or a video may have a single ES Scalable objects way have one ES for basic quality information + one or more enhancement layers for improved quality (e.g., finer detail, faster motion) ESs are split into packets and sent along with timing information for proper synchronization
4/20/2012 10
Object Descriptors (OD)
MPEG-4s mechanism that informs the system which ES belongs to a certain object
OD contain Elementary Stream Descriptors (ESD) which tell the system which decoders to use ODs are sent in their own stream which allows them to be added or deleted as the scene changes
4/20/2012 11
Profiles and Levels
MPEG-4 provides a set of tools for coding multimedia contents
an application may use only subsets of these tools
Profiles: MPEG-4s definitions of these subsets for audio, visual, graphics information Levels: define the computational complexity of the profiles tool subset Certain combinations of profiles fit well 4/20/2012 12 together
MPEG-4 Profiles
4/20/2012
13
MPEG-4 Visual Objects
Arbitrarily shaped objects are coded apart from their background Binary shape coding: a pixel is or is not part of an object
simple, crude technique, suitable for low-bit rates, suffers from aliasing
Alpha shape (gray scale) coding: each pixel is assigned a value for its transparency
objects can be smoothly blended into a background or with other objects
4/20/2012 14
Visual Objects
Rectangular natural images and scenes are coded using MPEG-1, 2 Texture is coded separately by a DCT, block based coding scheme or wavelets E.g., weather reports: the weathermans image seems to be standing in front of a map which is actually generated elsewhere
4/20/2012 15
Object Segmentation
MPEG does not specify how objects are extracted
video object segmentation is difficult e.g., record weathermans image in front of a color background
MPEG-4 specifies decoding
implementation of encoding is left to the industry to decide
4/20/2012 16
MPEG-4 Applications
MPEG-4 makes video possible even at very low bit-rates (e.g., 10 kb/s) Scalable objects for low bit-rates
mobile devices, internet
a base layer conveys all the information in some basic quality one of more enhancement layers can be sent to get better quality send only the most important objects
17
4/20/2012
Sprites
For coding unchanged backgrounds The background is defined and coded only once Must be updated for each change (e.g., when the viewing angles changes) The sprite is sent only once New views are created by sending the new positions
4/20/2012 18
Advanced Features
Map images into computer generated shapes
a 2D or 3D mesh may have an image mapped onto it a few parameters to deform the mesh generate the impression of a moving picture rather than sending new images for each change, send commands and parameters to the viewer pre-defined faces are particularly interesting meshes the appearance of a face may be left to the decoder (e.g., custom facial models can be downloaded)
4/20/2012 19
MPEG-4 Faces
Images laid over a wire-frame face Send wire-frame plus parameters Image reconstruction at receivers site Speech is generated from text in steps with motions of the mouth, eyes and lips
4/20/2012 20
MPEG-7
MPEG-7 (2002) focuses on description of multimedia content
modalities: image, speech, video, graphics and their combinations
MPEG-7 complements existing MPEG standards and is applicable even to non-MPEG formats (compressed or uncompressed) MPEG-7 is driven by trends in technology, market and user needs Applications: VideoOnDemand, NewsOnDemand, InteractiveTV, multimedia information systems etc.
4/20/2012
21
Scope of the Standard
Provides the means for indexing, searching, filtering and managing audiovisual content
broadcast media selection (e.g., personalized TV) multimedia editing (e.g., personalized news service)
tools may be designed for specific modalities, aspects or applications
MPEG-7 interoperable interface defines syntax and semantics
4/20/2012
22
Interoperable Services and Applications
4/20/2012
23
MPEG-7 Main Tasks
Multimedia: generate customized program guides or summaries of broadcast audio-visual content Archive: generate descriptions of audiovisual content (or elements) Adaptation: filter and transform multimedia streams in low bit-rate environments (e.g., mobile users)
4/20/2012 24
MPEG-7 Specific Tasks
Music/audio: play a few notes and return music with similar music/audio Images/graphics: draw a sketch and return images with similar graphics Movement: describe movements and return video clips with the specified temporal and spatial relations Scenario: describe actions and return scenarios where similar actions take place
4/20/2012
25
MPEG-7 Elements
1. Descriptors (D) : define syntax and semantics of features of audio-visual content
Application independent Low level: shape, motion, color, camera motion, harmonicity, timbre for audio ... Semantic level: events, concepts ...
4/20/2012
26
MPEG-7 Elements (cont.d)
2. Description Schemes (DS): specify the structure and semantics of the relationships among the constituent Ds or DSs e.g.,
Video DS specify syntax and semantics for segment decomposition, attributes, their relationships DS related to creation, production, and access of content (e.g., property rights, parental rating, etc.)
4/20/2012
27
MPEG-7 Elements (cont.d)
3. Description Definition Language (DDL): allows flexible definition of Ds and DSs based on XML schema
Ds and DSs are application independent DLLs to define specialized tools
4/20/2012
28
MPEG-7 Descriptions
MPEG-7 allows descriptions at different levels of abstractions
low level features extracted automatically semantic features with human interaction or textual annotation
MPEG-7 does not specify how features are extracted or used (e.g., filtering, retrieval)
their representation must conform to the MPEG-7 standard
4/20/2012
29
MPEG-7 Parts
Systems: specifies functionality at system level
Preparation of descriptions for efficient transport and storage synchronization of content and descriptors development of decoders
Description Definition Language (DDL): language for specifying new Ds and DSs
extension of XML schema
4/20/2012 30
MPEG-7 Visual
Specifies a set of standardized visual Ds and DSs
Color descriptors: color space, quantization Texture descriptors: homogeneous texture, texture browsing, edge histogram ... Shape descriptors: for regions or contours Motion descriptors: camera motion, trajectories, motion activity ... Face recognition
31
4/20/2012
MPEG-7 Audio
Specifies standardized audio descriptors and descriptor schemes for pure music, pure speech, sound effects, soundtracks
silence descriptor spoken content descriptors sound effects descriptors melody contour descriptors
4/20/2012 32
Multimedia Description Schemes
Specify a framework that allows generic description of all kinds of multimedia data
basic elements: data types, structures, Ds content management: content from several viewpoints (creation, usage etc.) organization of content by collections, classification navigation and access user interaction
33
4/20/2012
Multimedia Description Schemes
4/20/2012
34
MPEG-7 Reference Software
Reference implementation of the relevant parts of the MPEG-7 standard
The focus is on creating bit-streams of descriptors and description schemes (DDL parser, DDL validation, multimedia description schemes) Some software for extracting descriptors is also included (visual, audio descriptors)
4/20/2012
35