Video Transcoding Architectures
Video Transcoding Architectures
Architectures:
Designing Scalable Internet Video
Distribution Systems
David Lariviere and Professor Luca Carloni
Abstract: In recent years, the world has witnessed an explosion in both consumer interest
and adoption of digital and Internet-based video. As both the quality and sheer quantity
of both personal and publicly available online video sources continues to explode, one
must consider the available methods of processing and distributing the enormous amount
of data with novel system architectures targeted for the specific attributes and
requirements of the intended service.
Table of Contents
Taxonomy of Video Distribution Models: 3
Taxonomy Terminology: 3
Video Distribution System Examples 4
Live Internet Distribution of Existing TV Broadcast Material 4
On-Demand TV Episodes & Movies 5
User-Generated Content Video Distribution 6
Digital Video Recorders and Streaming Servers 7
Peer-to-Peer (P2P) Video File Sharing Networks: 8
Video Transcoding Architectures 11
Background on Video Codecs 11
Compression 11
Decompression 11
Transcoding 11
Advantages of Transcoding: 11
Transcoding and Distribution Architectures 13
Transcoding Hardware Components: 13
General Purpose Processing 13
Graphics Card-Assisted Video Processing 13
Custom Hardware (DSPs, FPGAs, and ASICs) 14
Consumer Electronics Components: 14
Hardware Components Summary 14
Video Distribution Architectures 15
Example Distribution Architectures 15
Personal DVR System Architecture 16
User Generated Video Hosting System Architecture 16
YouTube Case Study: Estimating Architecture Requirements 18
Summary: 18
Background: 18
Input Formats: 18
Standard Definition (DV): 18
High Definition: 19
YouTube Video Specifications: 19
YouTube Usage Statistics: 19
Required Transcoding Capacity: 19
Video Capture Method: 20
Transcoding: 20
Memory Consumption and Initial Transcoding Benchmarks: 20
Analysis: 21
Architectural Recommendations 21
Conclusions 22
Future Work 22
Taxonomy of Video Distribution Models:
In an effort to organize a taxonomy by which one may classify and understand the
characteristics that distinguish types of video distribution systems, it is necessary to
define a set of attributes describing the requirements and inherent properties of the
systems with respect to input and output.
Taxonomy Terminology:
• Video Input Data Source: The transportation mechanism and actual content that
a video distribution system receives to be distributed.
o Volume: The number of input data sources being simultaneously inputted
into the system per some unit of time.
o Persistent Storage: The amount of storage, if any, that a video
distribution system will require to store the incoming input data source. If
none, then the input data source can be discarded immediately after
processing.
o Note: The distinction between Video Input Data Source and “Video Input”
is chosen carefully in order to emphasize that the input to the distribution
system may not be a video file. For example, in a video distribution
system designed to record and retransmit broadcast HDTV over the
internet, the input signal is a radio signal that must be captured with an
HDTV capture card, before it can converted to and processed as a video
file. The distinction is important because the additional necessity of
capturing the input with specialized decoders has implications in choosing
transcoding architectures.
• Latency Restriction: Defines possible hard upper bound on the amount of time
between when an Input Data Source is provided and the Output Data Source is
made available by the distribution system.
o Note that latency restrictions are often but not always correlated with
downloaded vs. streamed distribution. When “downloading” content, the
entire output source is often downloaded before the end user may begin to
watch it, whereas with streaming distribution, simultaneous transmission
and viewing of the video can occur, given sufficient bandwidth.
• Video Output Data Source: The actual video content made available by the
distribution system that was derived from the Input Data Source.
o Demand: The number of simultaneous users that request the output data
source.
o Scalability: A measurement of how well a specific distribution system
will scale with the number of simultaneous users.
o Customization: Describes to what degree the output video is customized
for individual users, as determined by each user’s individual bandwidth
and quality requirements and capabilities.
o Persistent Storage: The amount of video data (if any) that must be
permanently stored and accessible by the video distribution system.
o Content Protection: Describes whether content owners of the material
being transmitted required DRM be applied to protect their content from
unauthorized distribution.
Description:
Note that live video capture and streaming distribution systems are expected to scale
fairly well, primarily because the actual number of possible simultaneous video output
streams are small, allowing for efficient buffering and serving of content.
Defining Attributes:
• Input Data Source:
o Volume: Low to Moderate
o Persistent Storage: None
• Latency Restriction: High
• Output Data Source:
o Demand: High
o Scalability: Moderate to High
o Customization: Low to High
o Persistent Storage: Low
o Content Protection: Yes
On-Demand TV Episodes & Movies
Description:
On-Demand Internet based TV and Movie distribution systems have been gaining
increasing momentum. Within the last two years, all of the major television broadcasters
(ABC, CBS, NBC, and Fox) are now offering on-demand streaming of their video
content on their websites, with some also offering digital downloads via iTunes. In
addition, both Netflix and Blockbuster offer IP based internet streaming of a small
portion of their movie library. Note that the existing product offerings of both the major
broadcast and the movie rental companies are still fairly limited in the number of
simultaneously available videos for on-demand retrieval.
While convenient, the PC-based distribution mechanisms have yet to supplant traditional
broadcast television. This may soon change, however, as numerous companies introduce
products bringing Internet-based video out of the PC and into home entertainment and
living rooms. Several examples include Microsoft’s demonstration at last year’s CES of
IPTV via the Xbox 360, and the introduction of Apple’s TV, a small device connected to
a television from which one can view both iTunes-purchased movies and YouTube
videos. Netflix has also recently announced a partnership with LG to develop a set-top
box capable of receiving and displaying streamed videos.
Comparison:
Note that while similar to Live IP-based TV distribution systems, On-Demand video
distribution has several important distinctions: 1) Latency requirements are significantly
reduced, as the input material is pre-recorded rather than provided in real time. 2) The
number of simultaneous new input videos arriving per each unit of time is significantly
less than that of the live broadcast system. While broadcast systems are constantly
streaming data, no matter how repetitive the material, on-demand systems can store the
material once, and then broadcast it multiple times as needed. 3) Output Persistent
Storage is large, by necessity of all available output being “On-Demand” at all times,
whereas streaming services only need to keep the data currently being inputted to the
system at that time, typically on the order of 102 number of channels. 4) On-Demand
Commercially-created video content tends to be the highest quality among any of the
video distribution systems. Because the content is not being streamed and processed in
real-time, it allows for higher quality transcoding.
Defining Attributes:
• Input Data Source:
o Volume: Low to Moderate
o Persistent Storage: Moderate to Large
• Latency Restriction: Low
• Output Data Source:
o Demand: High
o Scalability: Moderate to High
o Customization: Low to moderate
o Persistent Storage: Moderate to Large
o Content Protection: Yes
Description
• User generated video hosting sites, such as YouTube, are enormously popular. In
fact, according to Ellacoya Networks 1 , a provider of Broadband Service
Optimization technology, in June of 2007, Http-based streaming video surpassed
P2P traffic as the single largest consumer of internet bandwidth, consuming 43%
of all network traffic vs. P2P’s 37%. In fact, it was estimated that 10% of all
internet traffic was attributable solely to streaming videos on YouTube! A case
study estimating the requirements of transcoding all uploaded user content for a
YouTube sized service can be found in the “
• The site optionally may then transcodes the input video.
• Lastly, the site makes it available for streaming.
Several architecture designs must be made. First, note that the service must provide not
only direct links to the video files, but also an interactive website that can be searched
and interacted with by users in order to locate and share new videos. This implies that a
substantial web hosting capability is necessary.
One of the largest open decisions to be made involves the transcoding architecture. If the
content is to be uploaded by the user, should it be transcoded before being sent to the
service, on the user’s machine, or uploaded to the service for transcoding.
If the system will transcode user-submitted content, a system architect must also decide
how to transcode the video on the server. As mentioned in the transcoding architecture
components overview, several hardware systems exist for transcoding video. The
question then becomes one of choosing the architecture that has optimal combination of
low cost and high scalability. What follows, is a case study examining in detail the
evaluation process of a particular transcoding architecture.
YouTube Case Study” section.
Comparison
User Generated Video Portals, such as YouTube, represent an extreme bounds case in the
video distribution taxonomy. Such distribution systems must be architected with the
highest regard towards both processing and preparing for distribution of the vast amount
of constantly incoming user-generated video data, while also distributing the already
colossal library of videos. Further, note that while demand for the system as a whole may
be extremely high, there is expected to be a wide variance in the demand for individual
files. As covered in more detail in the case study, user generated video portals, most
likely as a result of their extremely large existing library size, do not make an extensive
effort to custom-tailor individual streams to match the parameters of clients. Further, the
quality tends to be among the lowest of any video distribution systems.
Defining Attributes:
• Input Data Source:
o Volume: Extremely High
o Persistent Storage: Extremely Large
• Latency Restriction: Moderately Low
• Output Data Source:
o Demand: Extremely Low to Extremely High per file. Extremely High on
Service as a whole.
o Scalability: Extremely High
o Customization: low (one to two universal formats)
o Content Protection: No
Description:
End-User DVRs and Remote Access systems can be either cable company-supplied DVR
boxes, separate networked hardware components, or PC capture cards that can be utilized
to turn existing PCs into personal DVRs and streaming video servers.
Defining Attributes:
• Input Data Source:
o Volume: Extremely Low
o Persistent Storage: Low to Moderate
• Latency Restriction: High
• Output Data Source
o Demand: Extremely Low
o Scalability: Extremely Low
o Customization: Medium to High.
o Persistent Storage
o Content Protection: No
Description:
Peer-to-Peer networks arguably formed the first massively large video distribution
networks on the internet. Not long Napster introduced the world to the distributive power
of the internet for spreading copyrighted music, so too did people begin to share movies
in mass numbers.
The evolution and history of P2P networks for the distribution of large volumes of videos
can be classified by a single historical event creating two separate periods: Before and
After BitTorrent. BitTorrent is a protocol, software program, company of same name.
The protocol was first developed in the summer of 2001. BitTorrent, more than any other
P2P or other distribution mechanism, facilitated the mass transfer of large amounts of
data, especially videos. The primary genius of the BitTorrent protocol is found in the
manner in which users download and upload.
First it helps to contrast BitTorrent with another well known P2P protocol: Gnutella.
Gnutella was one of if not the first distributed non-centralized P2P file sharing protocols.
When a user runs a gnutella client, their computer randomly begins searching IP
addresses looking for other gnutella clients. Upon finding a gnuetella client, they can then
learn about other computers also on the network. More importantly they can submit
search requests. A user’s query is then passed on to other users, who then in turn pass the
query on to each of the other users they are connected to and so on and so forth, as the
query passes throughout the network. In the case that someone on the network has a file
they’re sharing that matches the search description, their client then contracts the
querying client to notify them of the match. If the user indeed wants to download the file,
they then request it and the transmission starts. Note that if another user wants to
download the same file, the process repeats all over, and there is no logical relationship
between the first user currently downloading that file, and the next user.
BitTorrent is different. First it works by way of “.torrent” files containing metadata and
contact information for one or more bittorent trackers, which are computers that facilitate
the file transmissions. Once a user download a .torrent file, they connect to the trackers,
which then provide the client with a list of other clients from which the file can be
downloaded. Unlike traditional implementations of Gnutellla clients (that later adopted
some BitTorrent techniques), the client will download small portions of files from many
simultaneous users, instead of opening a single large HTTP file download from a single
user. Most importantly, the set of all users either hosting or downloading the file, forms a
“sworm” of which all users are aware.
The single most powerful aspect of BitTorrent is that, in order for a user to be provided
bandwidth by the swarm by which to download the file, it must also upload data it
already has downloaded to existing users of the swarm, thus a user is constantly both
downloading and uploading the file at the same time. The rate at which users are sent
data by other members will often grow linearly with the rate at which that user uploads to
other members of the swarm, hence those most likely to contribute to an increase in
bandwidth are prioritized over those who do not.
Comparison:
The defining aspect of BitTorrent-like P2P architectures is the distributed storage and
distribution of content, extensively leveraging the end user. Architectures based upon this
model are capable of extreme scalability without subject to the costs associated with
traditional large scale hosting of content.
Defining Attributes:
• Input Data Source:
o Volume: Extremely High
o Persistent Storage: Extremely Large (Distributed)
• Latency Restriction: None
• Output Data Source:
o Demand: Moderate
o Scalability: Extremely High
o Customization: None
o Storage Requirement: Extremely High.
o Content Protection: Yes and No (usually not)
Video Transcoding Architectures
Note that for clarity, the following discussion will only discuss video, but without loss of
generality, as any statements made for the encoding or decoding of video are equally true
for audio.
Compression
When dealing with digital representations of videos, standardized compression
algorithms, also known as codecs, are used to compress the video, substantially reducing
the file size, and hopefully not reducing the perceptual quality. Video compression is the
process of compressing raw frames of video, represented as a two dimensional array of
pixel values, where each pixel value indicates the intensity of light that fell upon the pixel
when the image was taken. After compressing the raw video frame, a bitstream that has
been both lossly and then losslessly compressed and adhering to the specific video
codec’s standard is outputted.
Decompression
Decompression is the inverse process to compression. A decoder takes in as input the
compressed bitstream outputted by the codec, and then generates a best approximation of
the original raw video frames. Note that the process is usually not invertible, as
information is typically lost in the compression stage that cannot be accurately recovered.
Transcoding
Transcoding is the process of converting an already compressed video file and changing
either the codec or the parameters of the codec (bitrate, quality, image size, etc). This
process is done by first decoding using the original codec’s decoder, and then re-
encoding the decoded buffer, using a new encoder or the same encoder with different
parameters.
Advantages of Transcoding:
Many scenarios exist in which transcoding can be quite useful:
1. Codec Upgrading: Transcoding can be done to upgrade a compressed video that
was stored in an earlier inferior codec to a newer codec which can generate
substantially better quality video with same amount of space, or the same quality
video with substantially smaller number of bits.
2. Device Compatibility: Often consumer electronics devices (cell phones, Video
iPods, Xbox 360s, etc) only support both a subset of available codecs and also a
maximum compatible quality for each of them. For example, the 5th Generation
Video iPods are capable of playing video files up to the following 3 :
a. H.264 encoded up to 768Kbps @ 320x240, 30fps, using Baseline Profile
Up to 1.3
b. MPEG-4 Video up to 2.5Mbps @ 480x480, 30 frames per sec with Simple
Profile.
Therefore, trying to play an H.264 video encoded at a resolution greater than
320x240 would fail, most likely because the iPod’s CPU is not powerful
enough to decode the file in real time. By transcoding an incompatible input
file, one can create device compatible video files customized for each target
device. Note further, that different codecs have substantially different
computing requirements, for example with H.264 being considerable more
costly than MPEG-4.
3. Network Compatibility: When streaming video over the internet (where playback
starts before finishing the download), the available network bandwidth can often
be less than the bitrate (in bits per second) of the video file being sent. The result
is that playback stops until the client can download enough video to commence
playback. When the bandwidth is consistently smaller than the encoded video
format, the video player will either halt for an enormous amount of time, or
constantly switch between playing and buffering, depending on the
implementation. One advanced solution to the problem
Transcoding and Distribution Architectures
Of the four “Key Features” of the latest line of Intel Quad Core Processors, two are
directly related to HD image processing. The marketing strategy shouldn’t come as a
surprise, given that video processing is both one of the most CPU intensive tasks and one
of the best candidates for leveraging additional cores efficiently. Intel hasn’t been the
only silicon manufacturer to notice this…
ATI Avivo HD
Starting with ATI Radeon HD 2600 and 2400 line of GPUs, ATI has introduced its
“Unified Video Decoder” (UVD), a dedicated hardware block specifically designed for
decoding, offering complete GPU-based decoding of both H.264 and VC-1 (latest
generation video codec required as part of the advanced profile for HD DVD and
BluRay). 5 While graphics cards have provided basic decoding acceleration, including
MPEG2, for years, it may not have used dedicated hardware nor offered the same level of
decoding acceleration. Specifically, the dedicated hardware blocks in the UVD also
support the arithmetic decoding algorithm (CABAC), which can be a significant portion
of CPU cycles spent on decoding, especially as the performance impact is directly
proportional to the bit stream length.
NVIDIA PureVideo HD
Existing commercial products range the complete gamut from individual hardware IP
cores targetting a single profile of H.264 7 all the way to a complete end to end systems
enclosed in a single rackspace unit capable of not only transcoding but also serving the
output video format via an alphabet soup of different transmission protocols 8
Upon examining the variety and availability of components for designing the transcoding
and even distribution system, it becomes slightly clearer just how large a problem and
possible solution space system architects must face when designing video transcoding
and distribution systems.
The good news is that for virtually every scenario conceived, including the examples
given in the taxonomy section, highly efficient methods exist for transcoding video.
Video Distribution Architectures
A video distribution system can be portioned and blocked, both logically and physically
in many different ways. One method of generalizing such systems, and abstracting away
the implementation details of a particular architecture, is to consider a video distribution
system as consisting of three main blocks: 1) Video Capture\Inputting 2)
Transcoding\Image Processing, 3) and Transmission\Serving of Output Comment.
Note that details of the architecture may often further subdivide and complicate the
architecture, but without loss of generality, we can model a complete Video Distribution
System as a sequence of obtaining, processing, and then distributing the inputted content.
Note that the Slingbox Pro has many possible inputs, including a cable TV tuner, along
with RCA, S-video, and HDMI inputs and outputs. It has also has a an Ethernet port for
connecting it to network. Once on the network and configured via a PC-based installation
program, the slingbox is then ready to serve video to external devices. In the context of
the introduced system decomposition, the sling box is a self contained single module,
responsible for both video capture, transcoding, and serving of output. .
One possible (and extremely common) dataflow between the service and users is as
follows:
• Users create video data, either recording it to video cameras or generating it
directly on a PC.
• If the recorded content was recorded to a standard digital video camera, then the
user must first record and transcode the video in order to significantly shrink the
file size.
• Given a precompressed file, the user can now submit the file to a video hosting
site, such as YouTube.
• The site optionally may then transcodes the input video.
• Lastly, the site makes it available for streaming.
Several architecture designs must be made. First, note that the service must provide not
only direct links to the video files, but also an interactive website that can be searched
and interacted with by users in order to locate and share new videos. This implies that a
substantial web hosting capability is necessary.
One of the largest open decisions to be made involves the transcoding architecture. If the
content is to be uploaded by the user, should it be transcoded before being sent to the
service, on the user’s machine, or uploaded to the service for transcoding.
If the system will transcode user-submitted content, a system architect must also decide
how to transcode the video on the server. As mentioned in the transcoding architecture
components overview, several hardware systems exist for transcoding video. The
question then becomes one of choosing the architecture that has optimal combination of
low cost and high scalability. What follows, is a case study examining in detail the
evaluation process of a particular transcoding architecture.
YouTube Case Study
Summary:
This section contains a case study, analyzing two possible transcoding architectures. The
target video distribution system is a user generated video sharing site of the same scale as
YouTube. This case study outlines the resource requirements for transcoding video
recorded with common personal video camcorders, and then analyzes the system
requirements both in terms of the input data formats and memory resources required
during transcoding.
Background:
The case study evolved out of an interest in possibly using cable company-supplied DVR
boxes to form a distributed transcoding grid. In order to evaluate the feasibility and
possible advantage of such an architecture, it was necessary to estimate both the memory
requirements of transcoding (to determine if the boxes were even capable of transcoding),
and then to compare it against a standard transcoding farm implementation consisting of
one or more general purpose PCs dedicated to transcoding.
Input Formats:
The input video formats being considered are both standard and high definition video
recorded with personal camcorders.
The majority of personal video camcorders are standard definition and adhere to the DV
(Digital Video) specification, which dictates both the digital video and audio storage
formats and physical recording mediums. The most common video resolutions are
720x480 with 4:1:1 sampling for NTSC, and 720x576 with 4:2:0 sampling for PAL.
Audio is stored raw at 16bits/sample with 2 channels @ 48KHz or 12 bits/sample with 4
channels @ 32kHz.
The precise total bitrate is not certain. Wikipedia [10] refers to the usage of error
correction in addition to the audio and video storage, resulting in a total bitrate of roughly
35.382 Mbits/sec. It is believed that the error correction overhead may only apply to the
physical storage layer when recording to tape. Experimental validation of a raw DV
recorded video to a PC yielded a bitrate of 27.3Mbits/sec, based on a roughly one
gigabyte video that was 5 minutes in length.
High Definition:
Unlike with standard definition video recording which mostly adheres to the DV
specification, there is no single HD video recording format.
Formats:
• HDV: MPEG2-based. Uses same DV/miniDV tapes.
o HDV1: 720p
o HDV2: 1080i
• AVCHD: uses H.264.
• HDCAM: DCT-based (similar to DV)
• XDCAM HD/EX: MPEG2
In summary, HD formats include virtually every major video compression codec. For the
purposes of evaluation, simulated HD content will be generated by transcoding the SD
input format into 720p MPEG4 encoded @ roughly 15 megabits/sec.
YouTube videos (uploaded via the web interface) are limited to 10 minutes in length.
When uploaded individually, the maximum file size is 100 MB. YouTube recommends
using MPEG4 (DivX, Xvid) video format, MP3 audio, 30fps.
Internally, YouTube offers video encoded with at either 320x240 (uploaded before
March ’08), or the new “High Quality” format (480x360). Videos uploaded before June
2007 were encoded with H.263, while those after with H.264.
According to a USA Today article dated 7/16/2006, 65,000 videos were uploaded daily in
the month of July, 2006. According to a blog post by Professor Michael Wesch, Assistant
Professor of Cultural Anthropology at Kansas State University, as of March 17, 2008,
between 150 to 200 thousand videos were being uploaded daily to YouTube. Further, the
average length of an uploaded video is 167 seconds.
Given the aforementioned statistics, 338 seconds of footage are uploaded every second to
YouTube, meaning an encoding capacity of 338 times real-time would be required to
transcode all uploaded user content. At 30 fps, this equates to roughly 10,000 frames per
second.
Transcoding:
Selected Inputs:
• DV Camcorder format: 720x480 (480p SD) @ 28.125Mbits/sec.
• Simulated HD Camcorder Format: MPEG4-encoded 1280x720 (720p) @
14.64Mbits/sec
• High Quality YouTube-Recommended Input Format: MPEG4-encoded 480x360
@ 493kbit/sec (maximum bitrate allowed for a 162 second clip to be under 10
megabytes).
Target Outputs:
• YouTube-Recommended Input Format (converting from Input Format #1 Æ
Input Format #3).
• YouTube Flash Video Format: FLV packaged H.263, 320x240.
• YouTube High Quality: H.264 @ 480x360.
Listed below is the memory consumed by ffmpeg and the FPS (frames per second)
transcoded while running. Note that YouTube High Quality output was not benchmarked,
due to unresolved errors with ffmpeg transcoding into the target format with H.264
wrapped in an FLV container (rather than standard H.263 + FLV).
All FPS figures cited above were run on a single core Pentium 4 (~ 4 year old computer,
released in 2004).
In order to estimate the transcoding power of modern architectures, the same video input
transcoding was run on a Intel Core 2 Quad core machine, transcoding YouTube’s
recommended input format into H.263 FLV (YouTube’s default output format until
recently) ran at 257 fps, utilizing a single core. Extrapolating further, a quad core
machine would therefore be able to transcode roughly one thousand frames per second,
given YouTube’s recommended transcoding format and transcoding to H.263.
Analysis:
Based upon the memory usage of ffmpeg while running on a windows machine, it is
believed that barring OS limitations, it should be easily possible to transcode on a cable
box with 64 MB of RAM or lower. In addition, assuming the usage of linux, it should be
possible to configure the system such that the required amount of RAM is made available.
Further, based upon rough estimates of transcoding on a single core of a server, a quad
core machine with similar Intel dies should be able to transcode roughly one thousand
frames per second, which for a 30 fps input video, equates to 33.33x realtime transcoding
capacity when transcoding from the recommended YouTube input video format to H.263
Flash-based video.
Given the estimated transcoding power of a quad core machine and YouTube usage
statistics, roughly 10 quad core machines would be required for transcoding all content
uploaded in real-time into H.263-formatted Flash YouTube videos.
Architectural Recommendations
The resulting benchmarks and usage estimates provide a rough estimate on the order of
magnitude of computation required for transcoding a YouTube-sized service. The
primary result is that the computing requirements do not justify a large scale attempt at
distributing the transcoded process over cable DVR boxes. Even using strictly
commodity PC hardware, it is possible to construct a real-time transcoding system of
sufficient scale.
Even considering additional higher complexity codecs, such as H.264, the transcoding
time isn’t expected to increase by much more than one order of magnitude. Further, using
just released custom ASICs, it would still be more economical to use PC-based
transcoding farms perhaps complemented with custom hardware accelerator boards
Conclusions
A taxonomy classifying video distribution and transcoding systems has been presented.
In addition, several example systems have been examined and compared. An overview of
the architectural components available to transcoding system architects has also been
presented. Lastly, a case study detailing the architectural requirements and final
recommendations for a large scale user generated distribution system has been presented.
Future Work
The user generated content case considered the case where all content was transcoded
only once, from which all users would stream. A radically different scenario, however,
involves the situation where every single user viewing a video is provided a custom
tailored video bitstream, optimized for the viewer’s particular bandwidth and hardware.
For users on mobile devices, the resolution can be resized to fit their screen, and perhaps
the bitrate or complexity of encoding lowered to match the low computing requirements
of the device. By contrast, when running on high end PCs or dedicated consumer
electronics devices connected to home entertainment systems with large HDTVs,
maximum resolution and quality are desired.
Based on the case study results, a strictly general purpose PC-based approach simply
would not scale. There are many open questions to be pursued with respect to
architecting large scale video distribution systems in which the video isn’t transcoded. To
date, the sheer scale and computational costs associated with such a system have made it
infeasible for large scale deployment.
References :
1
“Ellacoya Data Shows Web Traffic Overtakes Peer-to-Peer (P2P) as Largest Percentage of Bandwidth on
the Network.” Ellacoya Networks Press Release.
http://www.ellacoya.com/news/pdf/2007/NXTcommEllacoyaMediaAlert.pdf
2
“Slingbox PRO Overview.” Sling Media. http://www.slingmedia.com/go/slingbox-pro
3
“Video Support: Fifth General iPod (iPod with Video) 30 GB, 60 GB.” Apple Corp.
http://support.apple.com/specs/ipod/iPod_with_video_30_60_GB.html
4
“Intel Core 2 Quad Processors for Desktop.” Intel Website.
http://www.intel.com/Consumer/Learn/Desktop/core2quad-detail.htm?iid=learn_proc+c2q_desktop
5
“ATI Avivo HD Video Technology Brief.” ATI’s Website.
http://ati.amd.com/technology/Avivo/pdf/ATI_Avivo_HD_tech_brief.pdf
6
“PureVIdeo Product Comparison Chart.” Nvidia.
http://www.nvidia.com/docs/CP/11036/PureVideo_Product_Comparison.pdf
7
“Altera IP Cores.” 4i2i – The codec Specialists.
http://www.4i2i.com/index.php?option=com_content&task=view&id=25&Itemid=85
8
“RipCode On-Demand Signaling Server.” Ripcode. http://ripcode.com/prodODSS.php
9
“Slingbox PRO Quick Start Guide (US and Canada).” http://support.slingmedia.com/get/KB-005166.pdf
10
DV. Wikipedia. http://lipas.uwasa.fi/~f76998/video/conversion/#introduction. Accessed 3/14/2008.
11
DV Video Data and AVI Files. Microsoft Hardware Development Center Archives.
http://www.microsoft.com/whdc/archive/dvavi.mspx . Accessed 3/17/2008.
12
A Quick Guide to Digital Video Resolution and Aspect Ratio Conversions.
http://lipas.uwasa.fi/~f76998/video/conversion/#introduction . Accessed 3/19/2008.
13
DV, DVCAM, and DVCPRO formats. Adam Wilt. http://www.adamwilt.com/DV-tech.html. accessed
3/17/2008.
14
DVCAM Format Overview. Sony Corporation.
www.sony.ca/dvcam/pdfs/dvcam%20format%20overview.pdf Accessed 3/20/2008.
15
Understanding HD Formats. Waggoner, B. Microsoft Corp.
http://www.microsoft.com/windows/windowsmedia/howto/articles/UnderstandingHDFormats.aspx .
Accessed 3/20/2008.
16
HD Recording Formats Compared. Video Experts. http://videoexpert.home.att.net/artic3/262ctab.htm .
Accessed 3/20/2008.
17
Video-Camera Recording Format and Resolution Comparison. http://www.martin-
doppelbauer.de/video/indexEN.htm . Accessed 3/20/2008.
18
Uploading Videos to YouTube. YouTube Help Center.
http://www.google.com/support/youtube/bin/topic.py?topic=10524 .
19
Digital Age Enterprise – YouTube, LLC. PowerPoint presentation published on Scribd.
http://www.scribd.com/docinfo/2229635?access_key=key-12g6ew1d5015wl1t80eq . Accessed 03/20/2008.
20
YouTube Statistics. Digital Ethnography – Blog Archive. Professor Michael Welsch.
http://mediatedcultures.net/ksudigg/?p=163 . Accessed 3/24/2008.