Module 1
INTRODUCTION TO MULTIMEDIA DATABASES
Prof. Dr. Naomie Salim Faculty of Computer Science & Information Systems Universiti Teknologi Malaysia
The Explosion of Digital Multimedia Information
We interact with multimedia everyday Large amount of text, images, speech & video converted to digital form Advantages of digitized data over analog
Easy storage Easy processing Easy sharing
Give examples of multimedia applications that deals with storing, retrieving, processing and sharing of multimedia data
Eg 1. Journalism
Journalist to write article about influence of alcohol on driving Investigation involved:
Collect news articles about accidents, scientific reports, television commercials, police interviews, medical experts interviews
Illustration:
Search photo archives, stock footage companies for good photos shocking, funny, etc.
Other examples
Searching movies
Based on taste of movies already seen Based on movies a friend favor
Searching on web
Eg. searching Australian Open website (http://www.ausopen.org) Integrate conceptual terms + interesting events give info about video segments showing female American tennis players going to the net
Retrieval problems
EMPLOYEE (Name: char(20), City: Char(20), Photo: Image)
How do you select employees in Skudai? How do you select employees that wear tudung, wear glasses, fair and have a mole under the lips?
Characteristics of Media Data
Medium - Information representation
Alphanumeric Representation of audio, video and image
Static vs dynamic
Static: do not have time dimensions (alphanumeric data, images, graphics) Dynamic: have time dimensions (video, animation, audio)
Multimedia
Collection of media types used together At least one media types must be non-alphanumeric
Digital representation of text
OCR techniques convert analog text to digital text Eg. of digital representation: ASCII
Use 8 bits Chinese char requires more space Storage requirements depend on number of characters
Structured documents becoming more popular
Docs consist of titles, chapters, sections, paragraphs, etc. Standards like HTML and XML used to encode structured information
Compression of text
Huffman, arithmetic coding Since storage requirements not too high, less important than multimedia data
Digital representation of audio
Audio
air pressure waves with frequency, amplitude Human hears 20-20,000 Hertz Low amplitude soft sound
Digitizing pressure waveforms
Transform into electrical signal (by microphone) Convert into discrete values
Sampling: continuous time axis divided into small, fixed intervals Quantization: determination of amplitude of video signals at beginning of each time interval Human cannot notice difference between analog & digital with enough high sampling rate and precise quantization
Audio storage requirements
Example of a CD audio
16 bits per sample 44,000 samples per second Two (stereo) channels Requirements = 16 * 44,000 * 2 bits = 1.4 Mbit per second
Compression (examples)
Masking: Discard soft sound because not audible by louder sound Speech: coding of lower frequency sounds only MPEG: audio compression standards
Digital representation of image
Scan analog photos & pictures using scanner
Analog image approximated by rectangle of small dots In digital camera, ADC is built-in
Image consists of many small dots or picture elements (pixels)
Gray scale: 1 byte (8 bits) per pixel Color: 3 color (RGB) of one byte each Data required for 1 rectangular screen
A = xyb A:number of bytes needed, x: # pixels per horizontal line, y: # horizontal lines, b: # bytes per pixel
Image compression
Exploit redundancy in image & properties of human perception
Spatial redundancy: pixels in certain area often appear similar (golden sand, blue sky) Human tolerance: error still allows effective communication
Eg. of image compression
Transform coding Fractal image coding
Digital representation of video
Sequence of frames or images presented at fixed rate
Digital video obtained by digitizing analog videos or digital cameras Playing 25 frames per second gives illusion of continuous view
Amount of data to represent video
1 second, image: 512 lines, 512 pixels per line, 24 bits per pixel, 25 frames per second 512 * 512 * 3 * 25 = 19 Mbytes
Compression of video
Compressing frames of videos: similar to image
Reduce redundancy & exploit human perception properties
Temporal redundancy: neighboring frames normally similar, remove by applying motion estimation & compression
Each image divided into fixed-sized blocks For each block in image, the most similar block in previous image is determined & pixel difference computed Together with displacement between the two blocks, this difference stored or transmitted
MPEG-1 (VHS, pixel based coding): coding of video data up to speed of 1.5 Mbits per second MPEG-2 (pixel based coding): coding of video data up to speed of 10 Mbits per second MPEG-4 (multimedia data, object based coding) : coding of video data up to speed of 40 Mbits per second, tools for decoding & representing video objects, support content-based indexing & retrieval
How to search for images or multimedia data?
Analyze one by one? No! Takes too long! Have to use metadata instead of searching directly, search for metadata that have been added to it Metadata requirements to be valuable for searching:
Description of multimedia object should be as complete as possible Storage of metadata must not take too much overhead Comparison of two metadata values must be fast
Metadata of Multimedia Objects
Descriptive data
Give format or factual info about multimedia object Eg.: author name, creation date, length of multimedia object, representation technique Eg. standard for descriptive data: Dublin core Can use SQL (metadata condition in WHERE clause)
Metadata of Multimedia Objects (cont.)
Annotations
Textual description of contents of objects Eg.: photo description in Facebook Either free format or sequence of keywords Manual text annotations allow Information Retrieval techniques to be used but
Time consuming, expensive Subjective, incomplete
Structured concepts (eg semantic web, ER-like schema) can be used to describe content through concepts, their relationships to each other & MM object but
Also slow and expensive
Metadata of Multimedia Objects (cont.)
Features
Derive characteristics from MM object itself Need language to describe features, eg. MPEG-7 Process to capture features from MM object is called feature extraction
Performed automatically, sometimes with human support
Two feature classes
Low-level features High-level features
Low-level Features
Grasp data patterns & statistics of MM object Depend strongly on medium Extraction performed automatically Eg. for text
List of keywords with frequency indicators
Eg. for audio
Representation
Amplitude-time sequence: quantification of air pressure at each sample Silence:0, > silence:+ve amplitude, < silence:-ve amplitude
Eg. Low-level features derived
Energy (loudness of signal), ZCR(zero crossing rate-frequency of sign change)-high indicate speech, silence ratio(low indicates music)
Low-level features (cont.)
Eg. for images
Color histograms: # pixels having color of certain range Spatial relationships: eg. blue patterns appears above yellow (beach photo), Contrast: # dark spots neighboring light spots
Eg. for video
Use low-level features for image Eg. of temporal dimension: shot change-when pixel difference between two images is higher than certain threshold
Shot- sequence of images taken with same camera position
High-level features
Features which are meaningful to end user, such as golf course, forest How can we bridge semantic gap between low level and high level features
High level feature extraction from low level features Eg. text containing words football, referee football match text Eg. Speech to text translators (low level audio features to text) Eg. Video-Domain specific: loud sound from crowd, round object passing white line, followed by sharp whistle-goal
Multimedia Information Retrieval System (MIRS)
Component of MIRS - Archiving
MM data stored separately from its metadata
Voluminous Visible or audible delays in playback unacceptable
MM data managed separately in MM content server
Objects get identification to be used by other parts of MIRS at storage time Have to deal with compression and protection
Component of MIRS Feature Extraction (Indexing)
Extraction of metadata (annotations, descriptions, features) from incoming multimedia object Algorithms have to consider extraction dependencies. Eg.:
Video object segmented, choose key frame for each segment Extract low-level features from key frame Based on low-level features, classify into shots of audience, fields, close-ups For field shots, detect positions of players Extract body related features of players Determine where net playing begins and ends
Have to consider incremental maintenance (modification of MM objects, extractors, extraction dependencies)
Incremental Maintenance in ACOI Feature Extraction Architecture
Component of MIRS - Searching
Multimedia queries are diverse, can be specified in many different ways No exact match, many ways to describe MM objects Specifying information need
Direct user specifies info. need herself Indirect user relies on other users
Possible Querying Scenarios
Possible Querying Scenarios (cont.)
Queries based on Profile
Users expose preferences in one way or another Preferences stored in user profile in MIRS Can use profile of a friend if not sure & trusted
Queries based on Descriptive Data
Based on format and fact about MM object Eg. all movies with Director = Steven Spielberg
Possible Querying Scenarios (cont.)
Queries based on Annotations
Text-based: keywords or natural language Eg. Show me video in which Barack Obama shakes hand with Mahathir Mohamad
Set of keywords derived from query & compared with keywords in annotations of movies
Queries based on Features
content-based queries features derived (semi) automatically from content of MM object Low & high level features used Eg. Find all photos with color distribution like this photo Eg. Give me all football videos which a goal is scored within last ten minutes
goal is high-level feature that must be known to MIRS
Possible Querying Scenarios (cont.)
Query by example
Give example MM object MIRS extract all kinds of features from the MM object Resulting query based on these features
Similarity
Degree to which query & MM object of MIRS are similar Similarity calculated by MIRS based on metadata of MM object & query Try to estimate value of relevance of MM object to user Output is list of MM objects in descending order of similarity value
General Retrieval Model
Relevance Feedback
Helps when user doesnt know exactly what he is looking for, causing problem in query formulation Interactive approach
User issue starting query, MIRS compose result set, user judge output (relevant/not), MIRS uses feedback to improve retrieval process
Component of MIRS - Browsing
User sometimes cannot precisely specify what they want, but can recognize what they want when they see it Browsing let user scans through objects
Exploits hyperlinks which lead user from one object to other When object shown, user judge its relevance & proceed accordingly If objects are huge, icons are used
Starting point
query that describe info need or system provide starting point User can ask for another starting point if not satisfied Can classify object based on topics & subtopics
Component of MIRS Output Presentation (Play)
When MIRS returns list of objects, system has to decide whether user has right to see them User interface should be able to show all kinds of MM data What if objects are huge and result set large?
Give user perception of content of object Extract & present essential info for user to browse & select objects
Text: title, summary, places where keywords occur Audio: tune, start of song Images: summary of images thumbnails Video: cut into scene n choose for each scene a prime image
Component of MIRS Output Presentation (cont.)
Streaming
Content sent to client at specific rate and except for buffering, played directly Audio & video is delivered as continuous stream of packets When resource become scarce
Use switched Ethernet instead of shared Ethernet Use disk stripping Skip frames during play-back Fragment content over several content servers (need logical component between client & servers to direct client request to corresponding server)
Quality of MIRS
Recall
r/R
r: # of relevant objects returned by system, n: # objects retrieved, R: # relevant objects in collection
Precision
r/n
Relevance judged by humans, refer to TREC (Text Retrieval Conference)
Exercise
Discuss the role of DBMS in storing MM objects Discuss the role of Information Retrieval systems in storing MM objects
End of Module 1