0% found this document useful (0 votes)

31 views23 pages

Video Transcoding Architectures

The document discusses the design and architecture of scalable internet video distribution systems, highlighting the growing demand for digital video and the need for efficient transcoding methods. It categorizes various video distribution models, including live TV broadcasts, on-demand services, user-generated content, and peer-to-peer networks, each with distinct attributes and requirements. The document also includes a case study on YouTube's transcoding needs and emphasizes the importance of hardware components in video processing.

Uploaded by

Subhrendu Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views23 pages

Video Transcoding Architectures

Uploaded by

Subhrendu Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Video Transcoding

Architectures:
Designing Scalable Internet Video
Distribution Systems
David Lariviere and Professor Luca Carloni

Spring 2008, Columbia University

Abstract: In recent years, the world has witnessed an explosion in both consumer interest
and adoption of digital and Internet-based video. As both the quality and sheer quantity
of both personal and publicly available online video sources continues to explode, one
must consider the available methods of processing and distributing the enormous amount
of data with novel system architectures targeted for the specific attributes and
requirements of the intended service.
Table of Contents
Taxonomy of Video Distribution Models: 3
Taxonomy Terminology: 3
Video Distribution System Examples 4
Live Internet Distribution of Existing TV Broadcast Material 4
On-Demand TV Episodes & Movies 5
User-Generated Content Video Distribution 6
Digital Video Recorders and Streaming Servers 7
Peer-to-Peer (P2P) Video File Sharing Networks: 8
Video Transcoding Architectures 11
Background on Video Codecs 11
Compression 11
Decompression 11
Transcoding 11
Advantages of Transcoding: 11
Transcoding and Distribution Architectures 13
Transcoding Hardware Components: 13
General Purpose Processing 13
Graphics Card-Assisted Video Processing 13
Custom Hardware (DSPs, FPGAs, and ASICs) 14
Consumer Electronics Components: 14
Hardware Components Summary 14
Video Distribution Architectures 15
Example Distribution Architectures 15
Personal DVR System Architecture 16
User Generated Video Hosting System Architecture 16
YouTube Case Study: Estimating Architecture Requirements 18
Summary: 18
Background: 18
Input Formats: 18
Standard Definition (DV): 18
High Definition: 19
YouTube Video Specifications: 19
YouTube Usage Statistics: 19
Required Transcoding Capacity: 19
Video Capture Method: 20
Transcoding: 20
Memory Consumption and Initial Transcoding Benchmarks: 20
Analysis: 21
Architectural Recommendations 21
Conclusions 22
Future Work 22
Taxonomy of Video Distribution Models:
In an effort to organize a taxonomy by which one may classify and understand the
characteristics that distinguish types of video distribution systems, it is necessary to
define a set of attributes describing the requirements and inherent properties of the
systems with respect to input and output.

Taxonomy Terminology:
• Video Input Data Source: The transportation mechanism and actual content that
a video distribution system receives to be distributed.
o Volume: The number of input data sources being simultaneously inputted
into the system per some unit of time.
o Persistent Storage: The amount of storage, if any, that a video
distribution system will require to store the incoming input data source. If
none, then the input data source can be discarded immediately after
processing.
o Note: The distinction between Video Input Data Source and “Video Input”
is chosen carefully in order to emphasize that the input to the distribution
system may not be a video file. For example, in a video distribution
system designed to record and retransmit broadcast HDTV over the
internet, the input signal is a radio signal that must be captured with an
HDTV capture card, before it can converted to and processed as a video
file. The distinction is important because the additional necessity of
capturing the input with specialized decoders has implications in choosing
transcoding architectures.

• Latency Restriction: Defines possible hard upper bound on the amount of time
between when an Input Data Source is provided and the Output Data Source is
made available by the distribution system.
o Note that latency restrictions are often but not always correlated with
downloaded vs. streamed distribution. When “downloading” content, the
entire output source is often downloaded before the end user may begin to
watch it, whereas with streaming distribution, simultaneous transmission
and viewing of the video can occur, given sufficient bandwidth.

• Video Output Data Source: The actual video content made available by the
distribution system that was derived from the Input Data Source.
o Demand: The number of simultaneous users that request the output data
source.
o Scalability: A measurement of how well a specific distribution system
will scale with the number of simultaneous users.
o Customization: Describes to what degree the output video is customized
for individual users, as determined by each user’s individual bandwidth
and quality requirements and capabilities.
o Persistent Storage: The amount of video data (if any) that must be
permanently stored and accessible by the video distribution system.
o Content Protection: Describes whether content owners of the material
being transmitted required DRM be applied to protect their content from
unauthorized distribution.

Video Distribution System Examples

Live Internet Distribution of Existing TV Broadcast Material

Description:

Live broadcast of internet-provided “traditional” television programming (that is already

broadcast over the air to televisions) has yet to really take off. Wireless carriers offering
limited quality TV broadcast to cell phones are one of the more recent commercial
examples of broadcast TV. Such a system can often be characterized by a small number
of input channels (on the order of a 102 at most), with no persistent storage on either input
or output required, but with a relatively strict latency restrictions, given the live nature of
the TV programming (including 24 hour news channels). Customer demand is expected
to be relatively high, especially as measured per the few available output sources.
Depending entirely on the service, the outputted video can be highly customized to match
the specifications of the specific receiving device and bandwidth.

Note that live video capture and streaming distribution systems are expected to scale
fairly well, primarily because the actual number of possible simultaneous video output
streams are small, allowing for efficient buffering and serving of content.

Defining Attributes:
• Input Data Source:
o Volume: Low to Moderate
o Persistent Storage: None
• Latency Restriction: High
• Output Data Source:
o Demand: High
o Scalability: Moderate to High
o Customization: Low to High
o Persistent Storage: Low
o Content Protection: Yes
On-Demand TV Episodes & Movies

Description:
On-Demand Internet based TV and Movie distribution systems have been gaining
increasing momentum. Within the last two years, all of the major television broadcasters
(ABC, CBS, NBC, and Fox) are now offering on-demand streaming of their video
content on their websites, with some also offering digital downloads via iTunes. In
addition, both Netflix and Blockbuster offer IP based internet streaming of a small
portion of their movie library. Note that the existing product offerings of both the major
broadcast and the movie rental companies are still fairly limited in the number of
simultaneously available videos for on-demand retrieval.

While convenient, the PC-based distribution mechanisms have yet to supplant traditional
broadcast television. This may soon change, however, as numerous companies introduce
products bringing Internet-based video out of the PC and into home entertainment and
living rooms. Several examples include Microsoft’s demonstration at last year’s CES of
IPTV via the Xbox 360, and the introduction of Apple’s TV, a small device connected to
a television from which one can view both iTunes-purchased movies and YouTube
videos. Netflix has also recently announced a partnership with LG to develop a set-top
box capable of receiving and displaying streamed videos.

Comparison:
Note that while similar to Live IP-based TV distribution systems, On-Demand video
distribution has several important distinctions: 1) Latency requirements are significantly
reduced, as the input material is pre-recorded rather than provided in real time. 2) The
number of simultaneous new input videos arriving per each unit of time is significantly
less than that of the live broadcast system. While broadcast systems are constantly
streaming data, no matter how repetitive the material, on-demand systems can store the
material once, and then broadcast it multiple times as needed. 3) Output Persistent
Storage is large, by necessity of all available output being “On-Demand” at all times,
whereas streaming services only need to keep the data currently being inputted to the
system at that time, typically on the order of 102 number of channels. 4) On-Demand
Commercially-created video content tends to be the highest quality among any of the
video distribution systems. Because the content is not being streamed and processed in
real-time, it allows for higher quality transcoding.

Defining Attributes:
• Input Data Source:
o Volume: Low to Moderate
o Persistent Storage: Moderate to Large
• Latency Restriction: Low
• Output Data Source:
o Demand: High
o Scalability: Moderate to High
o Customization: Low to moderate
o Persistent Storage: Moderate to Large
o Content Protection: Yes

User-Generated Content Video Distribution

Description
• User generated video hosting sites, such as YouTube, are enormously popular. In
fact, according to Ellacoya Networks 1 , a provider of Broadband Service
Optimization technology, in June of 2007, Http-based streaming video surpassed
P2P traffic as the single largest consumer of internet bandwidth, consuming 43%
of all network traffic vs. P2P’s 37%. In fact, it was estimated that 10% of all
internet traffic was attributable solely to streaming videos on YouTube! A case
study estimating the requirements of transcoding all uploaded user content for a
YouTube sized service can be found in the “
• The site optionally may then transcodes the input video.
• Lastly, the site makes it available for streaming.

Several architecture designs must be made. First, note that the service must provide not
only direct links to the video files, but also an interactive website that can be searched
and interacted with by users in order to locate and share new videos. This implies that a
substantial web hosting capability is necessary.

One of the largest open decisions to be made involves the transcoding architecture. If the
content is to be uploaded by the user, should it be transcoded before being sent to the
service, on the user’s machine, or uploaded to the service for transcoding.

Comparison
User Generated Video Portals, such as YouTube, represent an extreme bounds case in the
video distribution taxonomy. Such distribution systems must be architected with the
highest regard towards both processing and preparing for distribution of the vast amount
of constantly incoming user-generated video data, while also distributing the already
colossal library of videos. Further, note that while demand for the system as a whole may
be extremely high, there is expected to be a wide variance in the demand for individual
files. As covered in more detail in the case study, user generated video portals, most
likely as a result of their extremely large existing library size, do not make an extensive
effort to custom-tailor individual streams to match the parameters of clients. Further, the
quality tends to be among the lowest of any video distribution systems.

Defining Attributes:
• Input Data Source:
o Volume: Extremely High
o Persistent Storage: Extremely Large
• Latency Restriction: Moderately Low
• Output Data Source:
o Demand: Extremely Low to Extremely High per file. Extremely High on
Service as a whole.
o Scalability: Extremely High
o Customization: low (one to two universal formats)
o Content Protection: No

Digital Video Recorders and Streaming Servers

Note: In this section, that by “DVR/Streaming Servers”, we are considering only the class
of end-user devices that enable customers to record and remotely view stream recorded
and live TV, separate from commercial devices that may be used by Television broadcast
companies.

Description:
End-User DVRs and Remote Access systems can be either cable company-supplied DVR
boxes, separate networked hardware components, or PC capture cards that can be utilized
to turn existing PCs into personal DVRs and streaming video servers.

As an example, consider the Slingbox 2 line of products from Slingmedia, the

manufacturer of a line of consumer devices that allow consumers to record TV and thn
either watch it on the same TV or remotely view it via another computer or cell phone
from any where in the world.
Comparison:
Video distribution systems created from consumer electronics products intended to allow
the recording and remote viewing of captured-TV streams is very similar to the Live
Broadcast and viewing of TV provided by the major TV or possibly third party cable
company. The primary difference, however, is that in the previous case in which video is
captured with the intent of offering a highly scalable live video stream that may be
viewed by many people, whereas in the personal DVR case, each consumer becomes
their own individual Video Distribution System, decreasing the scalability and total
content sizes.
.

Defining Attributes:
• Input Data Source:
o Volume: Extremely Low
o Persistent Storage: Low to Moderate
• Latency Restriction: High
• Output Data Source
o Demand: Extremely Low
o Scalability: Extremely Low
o Customization: Medium to High.
o Persistent Storage
o Content Protection: No

Peer-to-Peer (P2P) Video File Sharing Networks:

Description:
Peer-to-Peer networks arguably formed the first massively large video distribution
networks on the internet. Not long Napster introduced the world to the distributive power
of the internet for spreading copyrighted music, so too did people begin to share movies
in mass numbers.

The evolution and history of P2P networks for the distribution of large volumes of videos
can be classified by a single historical event creating two separate periods: Before and
After BitTorrent. BitTorrent is a protocol, software program, company of same name.
The protocol was first developed in the summer of 2001. BitTorrent, more than any other
P2P or other distribution mechanism, facilitated the mass transfer of large amounts of
data, especially videos. The primary genius of the BitTorrent protocol is found in the
manner in which users download and upload.

First it helps to contrast BitTorrent with another well known P2P protocol: Gnutella.
Gnutella was one of if not the first distributed non-centralized P2P file sharing protocols.
When a user runs a gnutella client, their computer randomly begins searching IP
addresses looking for other gnutella clients. Upon finding a gnuetella client, they can then
learn about other computers also on the network. More importantly they can submit
search requests. A user’s query is then passed on to other users, who then in turn pass the
query on to each of the other users they are connected to and so on and so forth, as the
query passes throughout the network. In the case that someone on the network has a file
they’re sharing that matches the search description, their client then contracts the
querying client to notify them of the match. If the user indeed wants to download the file,
they then request it and the transmission starts. Note that if another user wants to
download the same file, the process repeats all over, and there is no logical relationship
between the first user currently downloading that file, and the next user.

BitTorrent is different. First it works by way of “.torrent” files containing metadata and
contact information for one or more bittorent trackers, which are computers that facilitate
the file transmissions. Once a user download a .torrent file, they connect to the trackers,
which then provide the client with a list of other clients from which the file can be
downloaded. Unlike traditional implementations of Gnutellla clients (that later adopted
some BitTorrent techniques), the client will download small portions of files from many
simultaneous users, instead of opening a single large HTTP file download from a single
user. Most importantly, the set of all users either hosting or downloading the file, forms a
“sworm” of which all users are aware.

The single most powerful aspect of BitTorrent is that, in order for a user to be provided
bandwidth by the swarm by which to download the file, it must also upload data it
already has downloaded to existing users of the swarm, thus a user is constantly both
downloading and uploading the file at the same time. The rate at which users are sent
data by other members will often grow linearly with the rate at which that user uploads to
other members of the swarm, hence those most likely to contribute to an increase in
bandwidth are prioritized over those who do not.

The introduction of BitTorrent revolutionized downloading of large video files over

decentralized P2P networks. BitTorrent, the company, has recently announced
partnerships with several movie and TV studios, with the intent on applying its
technology to facilitate commercial highly scalable distribution of content.

Comparison:
The defining aspect of BitTorrent-like P2P architectures is the distributed storage and
distribution of content, extensively leveraging the end user. Architectures based upon this
model are capable of extreme scalability without subject to the costs associated with
traditional large scale hosting of content.

Defining Attributes:
• Input Data Source:
o Volume: Extremely High
o Persistent Storage: Extremely Large (Distributed)
• Latency Restriction: None
• Output Data Source:
o Demand: Moderate
o Scalability: Extremely High
o Customization: None
o Storage Requirement: Extremely High.
o Content Protection: Yes and No (usually not)
Video Transcoding Architectures

Background on Video Codecs

Note that for clarity, the following discussion will only discuss video, but without loss of
generality, as any statements made for the encoding or decoding of video are equally true
for audio.

Compression
When dealing with digital representations of videos, standardized compression
algorithms, also known as codecs, are used to compress the video, substantially reducing
the file size, and hopefully not reducing the perceptual quality. Video compression is the
process of compressing raw frames of video, represented as a two dimensional array of
pixel values, where each pixel value indicates the intensity of light that fell upon the pixel
when the image was taken. After compressing the raw video frame, a bitstream that has
been both lossly and then losslessly compressed and adhering to the specific video
codec’s standard is outputted.

Decompression
Decompression is the inverse process to compression. A decoder takes in as input the
compressed bitstream outputted by the codec, and then generates a best approximation of
the original raw video frames. Note that the process is usually not invertible, as
information is typically lost in the compression stage that cannot be accurately recovered.

Transcoding
Transcoding is the process of converting an already compressed video file and changing
either the codec or the parameters of the codec (bitrate, quality, image size, etc). This
process is done by first decoding using the original codec’s decoder, and then re-
encoding the decoded buffer, using a new encoder or the same encoder with different
parameters.

Note that by definition, transcoding is computational more complex than either

compression or decompression, since it consists of doing both.

Advantages of Transcoding:
Many scenarios exist in which transcoding can be quite useful:
1. Codec Upgrading: Transcoding can be done to upgrade a compressed video that
was stored in an earlier inferior codec to a newer codec which can generate
substantially better quality video with same amount of space, or the same quality
video with substantially smaller number of bits.
2. Device Compatibility: Often consumer electronics devices (cell phones, Video
iPods, Xbox 360s, etc) only support both a subset of available codecs and also a
maximum compatible quality for each of them. For example, the 5th Generation
Video iPods are capable of playing video files up to the following 3 :
a. H.264 encoded up to 768Kbps @ 320x240, 30fps, using Baseline Profile
Up to 1.3
b. MPEG-4 Video up to 2.5Mbps @ 480x480, 30 frames per sec with Simple
Profile.
Therefore, trying to play an H.264 video encoded at a resolution greater than
320x240 would fail, most likely because the iPod’s CPU is not powerful
enough to decode the file in real time. By transcoding an incompatible input
file, one can create device compatible video files customized for each target
device. Note further, that different codecs have substantially different
computing requirements, for example with H.264 being considerable more
costly than MPEG-4.
3. Network Compatibility: When streaming video over the internet (where playback
starts before finishing the download), the available network bandwidth can often
be less than the bitrate (in bits per second) of the video file being sent. The result
is that playback stops until the client can download enough video to commence
playback. When the bandwidth is consistently smaller than the encoded video
format, the video player will either halt for an enormous amount of time, or
constantly switch between playing and buffering, depending on the
implementation. One advanced solution to the problem
Transcoding and Distribution Architectures

Transcoding Hardware Components:

The purpose of this section is to outline the possible hardware components that may be
used in architecting video transcoding and distribution architectures.

General Purpose Processing

By far the most common means of transcoding video is to use software running on one or
more CPU core(s) of the computer. Intel is certainly aware of the performance
requirements for HD video processing, Consider the list of four key features listed on the
product website of the Intel Core 2 Quad processors:

Intel Quad Core Product Website. 4

Of the four “Key Features” of the latest line of Intel Quad Core Processors, two are
directly related to HD image processing. The marketing strategy shouldn’t come as a
surprise, given that video processing is both one of the most CPU intensive tasks and one
of the best candidates for leveraging additional cores efficiently. Intel hasn’t been the
only silicon manufacturer to notice this…

Graphics Card-Assisted Video Processing

Both ATI (now owned by AMD) and Nvidia are offering accelerated HD video decoding
with their latest boards, with differing capabilities and methods by which third party
developers can take advantage of the full power of modern GPUs’ tens and even
hundreds of stream processor cores.

ATI Avivo HD
Starting with ATI Radeon HD 2600 and 2400 line of GPUs, ATI has introduced its
“Unified Video Decoder” (UVD), a dedicated hardware block specifically designed for
decoding, offering complete GPU-based decoding of both H.264 and VC-1 (latest
generation video codec required as part of the advanced profile for HD DVD and
BluRay). 5 While graphics cards have provided basic decoding acceleration, including
MPEG2, for years, it may not have used dedicated hardware nor offered the same level of
decoding acceleration. Specifically, the dedicated hardware blocks in the UVD also
support the arithmetic decoding algorithm (CABAC), which can be a significant portion
of CPU cycles spent on decoding, especially as the performance impact is directly
proportional to the bit stream length.

NVIDIA PureVideo HD

Nvidia’s PureVide HD (along with PureVideo 1 and PureVideo 2) 6 technologies also

include GPU-based accelerated decoding that sounds quite similar to that offered by ATI.
Most importantly, it offers full H.264 and VC1 decoding acceleration, including CABAC
entropy decoding.

Custom Hardware (DSPs, FPGAs, and ASICs)

There are a variety of other hardware technologies for handling varying portions of video
transcoding.

Existing commercial products range the complete gamut from individual hardware IP
cores targetting a single profile of H.264 7 all the way to a complete end to end systems
enclosed in a single rackspace unit capable of not only transcoding but also serving the
output video format via an alphabet soup of different transmission protocols 8

Consumer Electronics Components:

As more and more resistors find themselves into even low end mass market consumer
devices, the computational power and video capabilities of consumer electronics devices
continues to increase.

The available computational power of consumer electronics can vary dramatically.

Compare, for example, the computational power of a battery-optimized cellular phone
with that of the PS3’s 10 core Cell processor. What an increasing number of such devices
all have in common, however, is the ability to decode and encode video, with many of
them utilizing custom ASICs for at least part of the process.

Hardware Components Summary

Upon examining the variety and availability of components for designing the transcoding
and even distribution system, it becomes slightly clearer just how large a problem and
possible solution space system architects must face when designing video transcoding
and distribution systems.

The good news is that for virtually every scenario conceived, including the examples
given in the taxonomy section, highly efficient methods exist for transcoding video.
Video Distribution Architectures
A video distribution system can be portioned and blocked, both logically and physically
in many different ways. One method of generalizing such systems, and abstracting away
the implementation details of a particular architecture, is to consider a video distribution
system as consisting of three main blocks: 1) Video Capture\Inputting 2)
Transcoding\Image Processing, 3) and Transmission\Serving of Output Comment.

Video Distribution System

Input Video Source
Video Output
Video Capture Transcoding and Distribution
and Input &
Image
Processing

Figure 1 Decomposition of Video Distribution Systems

Note that details of the architecture may often further subdivide and complicate the
architecture, but without loss of generality, we can model a complete Video Distribution
System as a sequence of obtaining, processing, and then distributing the inputted content.

Example Distribution Architectures

The goal is now to review some of the previously described video distribution systems
taxonomy examples and illustrate possible architectures and the reasoning behind them.
Personal DVR System Architecture
For this architecture, we assume an end consumer has purchase a Slingmedia Slingbox
Pro, to be used in creating a personal video distribution system.2

Note that the Slingbox Pro has many possible inputs, including a cable TV tuner, along
with RCA, S-video, and HDMI inputs and outputs. It has also has a an Ethernet port for
connecting it to network. Once on the network and configured via a PC-based installation
program, the slingbox is then ready to serve video to external devices. In the context of
the introduced system decomposition, the sling box is a self contained single module,
responsible for both video capture, transcoding, and serving of output. .

Figure 2 Demonstrates Capabilities of Personal DVR based Video Distribution Server 9

User Generated Video Hosting System Architecture

Next we consider the case of the necessary video distribution system architecture for a
YouTube like service.

One possible (and extremely common) dataflow between the service and users is as
follows:
• Users create video data, either recording it to video cameras or generating it
directly on a PC.
• If the recorded content was recorded to a standard digital video camera, then the
user must first record and transcode the video in order to significantly shrink the
file size.
• Given a precompressed file, the user can now submit the file to a video hosting
site, such as YouTube.
• The site optionally may then transcodes the input video.
• Lastly, the site makes it available for streaming.

If the system will transcode user-submitted content, a system architect must also decide
how to transcode the video on the server. As mentioned in the transcoding architecture
components overview, several hardware systems exist for transcoding video. The
question then becomes one of choosing the architecture that has optimal combination of
low cost and high scalability. What follows, is a case study examining in detail the
evaluation process of a particular transcoding architecture.
YouTube Case Study
Summary:
This section contains a case study, analyzing two possible transcoding architectures. The
target video distribution system is a user generated video sharing site of the same scale as
YouTube. This case study outlines the resource requirements for transcoding video
recorded with common personal video camcorders, and then analyzes the system
requirements both in terms of the input data formats and memory resources required
during transcoding.

Background:
The case study evolved out of an interest in possibly using cable company-supplied DVR
boxes to form a distributed transcoding grid. In order to evaluate the feasibility and
possible advantage of such an architecture, it was necessary to estimate both the memory
requirements of transcoding (to determine if the boxes were even capable of transcoding),
and then to compare it against a standard transcoding farm implementation consisting of
one or more general purpose PCs dedicated to transcoding.

Input Formats:
The input video formats being considered are both standard and high definition video
recorded with personal camcorders.

Standard Definition (DV):

The majority of personal video camcorders are standard definition and adhere to the DV
(Digital Video) specification, which dictates both the digital video and audio storage
formats and physical recording mediums. The most common video resolutions are
720x480 with 4:1:1 sampling for NTSC, and 720x576 with 4:2:0 sampling for PAL.
Audio is stored raw at 16bits/sample with 2 channels @ 48KHz or 12 bits/sample with 4
channels @ 32kHz.

DV specifies a relatively simple DCT-based video compression operating at a fixed

bitrate of 25Mbits/sec, with raw audio for 2 channel 48KHz at 1.536Mbits/sec audio.
Note that the DCT-based scheme is similar in concept to the methods used for JPEG
image compression.

The precise total bitrate is not certain. Wikipedia [10] refers to the usage of error
correction in addition to the audio and video storage, resulting in a total bitrate of roughly
35.382 Mbits/sec. It is believed that the error correction overhead may only apply to the
physical storage layer when recording to tape. Experimental validation of a raw DV
recorded video to a PC yielded a bitrate of 27.3Mbits/sec, based on a roughly one
gigabyte video that was 5 minutes in length.

High Definition:
Unlike with standard definition video recording which mostly adheres to the DV
specification, there is no single HD video recording format.

Formats:
• HDV: MPEG2-based. Uses same DV/miniDV tapes.
o HDV1: 720p
o HDV2: 1080i
• AVCHD: uses H.264.
• HDCAM: DCT-based (similar to DV)
• XDCAM HD/EX: MPEG2

In summary, HD formats include virtually every major video compression codec. For the
purposes of evaluation, simulated HD content will be generated by transcoding the SD
input format into 720p MPEG4 encoded @ roughly 15 megabits/sec.

YouTube Video Specifications:

YouTube videos (uploaded via the web interface) are limited to 10 minutes in length.
When uploaded individually, the maximum file size is 100 MB. YouTube recommends
using MPEG4 (DivX, Xvid) video format, MP3 audio, 30fps.

Internally, YouTube offers video encoded with at either 320x240 (uploaded before
March ’08), or the new “High Quality” format (480x360). Videos uploaded before June
2007 were encoded with H.263, while those after with H.264.

YouTube Usage Statistics:

According to a USA Today article dated 7/16/2006, 65,000 videos were uploaded daily in
the month of July, 2006. According to a blog post by Professor Michael Wesch, Assistant
Professor of Cultural Anthropology at Kansas State University, as of March 17, 2008,
between 150 to 200 thousand videos were being uploaded daily to YouTube. Further, the
average length of an uploaded video is 167 seconds.

Required Transcoding Capacity:

Given the aforementioned statistics, 338 seconds of footage are uploaded every second to
YouTube, meaning an encoding capacity of 338 times real-time would be required to
transcode all uploaded user content. At 30 fps, this equates to roughly 10,000 frames per
second.

Video Capture Method:

Utilizing VLC with Sony DCR-HC21 camcorder connected to PC via firewire. Method
below will capture raw DV video packets as transmitted directly over ieee1394 protocol,
and saved in raw format to disc.
1. VLC: FileÆ Open Capture Device
2. Video device name, “Refresh list”, then choose “Microsoft DV Camera and
VCR”.
3. Check “Stream/Save”, click adjacent “Settings…”
4. Check “File” under Outputs, check “Dump raw input”, choose destination output,
saving file with “.dv” extension (Note ffmpeg relies on extension for format
determination). “OK” to exit settings. “OK” to exit “Open…”.
5. At main VLC window, status bar should now read “dshow://”. Note that while
video is not displayed, it is being recorded. To stop recording, click stop button.

Transcoding:

Selected Inputs:
• DV Camcorder format: 720x480 (480p SD) @ 28.125Mbits/sec.
• Simulated HD Camcorder Format: MPEG4-encoded 1280x720 (720p) @
14.64Mbits/sec
• High Quality YouTube-Recommended Input Format: MPEG4-encoded 480x360
@ 493kbit/sec (maximum bitrate allowed for a 162 second clip to be under 10
megabytes).

Target Outputs:
• YouTube-Recommended Input Format (converting from Input Format #1 Æ
Input Format #3).
• YouTube Flash Video Format: FLV packaged H.263, 320x240.
• YouTube High Quality: H.264 @ 480x360.

Memory Consumption and Initial Transcoding Benchmarks:

Listed below is the memory consumed by ffmpeg and the FPS (frames per second)
transcoded while running. Note that YouTube High Quality output was not benchmarked,
due to unresolved errors with ffmpeg transcoding into the target format with H.264
wrapped in an FLV container (rather than standard H.263 + FLV).

YouTube Input H.263 FLV

Input Format Memory FPS Memory FPS
DV 8124 KB 61 7152 KB 76
HD 12920 KB 41 12088 KB 50
YouTube Input 8736 KB 160 8136 KB 196

All FPS figures cited above were run on a single core Pentium 4 (~ 4 year old computer,
released in 2004).

In order to estimate the transcoding power of modern architectures, the same video input
transcoding was run on a Intel Core 2 Quad core machine, transcoding YouTube’s
recommended input format into H.263 FLV (YouTube’s default output format until
recently) ran at 257 fps, utilizing a single core. Extrapolating further, a quad core
machine would therefore be able to transcode roughly one thousand frames per second,
given YouTube’s recommended transcoding format and transcoding to H.263.

Analysis:
Based upon the memory usage of ffmpeg while running on a windows machine, it is
believed that barring OS limitations, it should be easily possible to transcode on a cable
box with 64 MB of RAM or lower. In addition, assuming the usage of linux, it should be
possible to configure the system such that the required amount of RAM is made available.

Further, based upon rough estimates of transcoding on a single core of a server, a quad
core machine with similar Intel dies should be able to transcode roughly one thousand
frames per second, which for a 30 fps input video, equates to 33.33x realtime transcoding
capacity when transcoding from the recommended YouTube input video format to H.263
Flash-based video.

Given the estimated transcoding power of a quad core machine and YouTube usage
statistics, roughly 10 quad core machines would be required for transcoding all content
uploaded in real-time into H.263-formatted Flash YouTube videos.

Architectural Recommendations
The resulting benchmarks and usage estimates provide a rough estimate on the order of
magnitude of computation required for transcoding a YouTube-sized service. The
primary result is that the computing requirements do not justify a large scale attempt at
distributing the transcoded process over cable DVR boxes. Even using strictly
commodity PC hardware, it is possible to construct a real-time transcoding system of
sufficient scale.

Even considering additional higher complexity codecs, such as H.264, the transcoding
time isn’t expected to increase by much more than one order of magnitude. Further, using
just released custom ASICs, it would still be more economical to use PC-based
transcoding farms perhaps complemented with custom hardware accelerator boards
Conclusions
A taxonomy classifying video distribution and transcoding systems has been presented.
In addition, several example systems have been examined and compared. An overview of
the architectural components available to transcoding system architects has also been
presented. Lastly, a case study detailing the architectural requirements and final
recommendations for a large scale user generated distribution system has been presented.

Future Work
The user generated content case considered the case where all content was transcoded
only once, from which all users would stream. A radically different scenario, however,
involves the situation where every single user viewing a video is provided a custom
tailored video bitstream, optimized for the viewer’s particular bandwidth and hardware.
For users on mobile devices, the resolution can be resized to fit their screen, and perhaps
the bitrate or complexity of encoding lowered to match the low computing requirements
of the device. By contrast, when running on high end PCs or dedicated consumer
electronics devices connected to home entertainment systems with large HDTVs,
maximum resolution and quality are desired.

Based on the case study results, a strictly general purpose PC-based approach simply
would not scale. There are many open questions to be pursued with respect to
architecting large scale video distribution systems in which the video isn’t transcoded. To
date, the sheer scale and computational costs associated with such a system have made it
infeasible for large scale deployment.
References :
1
“Ellacoya Data Shows Web Traffic Overtakes Peer-to-Peer (P2P) as Largest Percentage of Bandwidth on
the Network.” Ellacoya Networks Press Release.
http://www.ellacoya.com/news/pdf/2007/NXTcommEllacoyaMediaAlert.pdf
2
“Slingbox PRO Overview.” Sling Media. http://www.slingmedia.com/go/slingbox-pro
3
“Video Support: Fifth General iPod (iPod with Video) 30 GB, 60 GB.” Apple Corp.
http://support.apple.com/specs/ipod/iPod_with_video_30_60_GB.html
4
“Intel Core 2 Quad Processors for Desktop.” Intel Website.
http://www.intel.com/Consumer/Learn/Desktop/core2quad-detail.htm?iid=learn_proc+c2q_desktop
5
“ATI Avivo HD Video Technology Brief.” ATI’s Website.
http://ati.amd.com/technology/Avivo/pdf/ATI_Avivo_HD_tech_brief.pdf
6
“PureVIdeo Product Comparison Chart.” Nvidia.
http://www.nvidia.com/docs/CP/11036/PureVideo_Product_Comparison.pdf
7
“Altera IP Cores.” 4i2i – The codec Specialists.
http://www.4i2i.com/index.php?option=com_content&task=view&id=25&Itemid=85
8
“RipCode On-Demand Signaling Server.” Ripcode. http://ripcode.com/prodODSS.php
9
“Slingbox PRO Quick Start Guide (US and Canada).” http://support.slingmedia.com/get/KB-005166.pdf
10
DV. Wikipedia. http://lipas.uwasa.fi/~f76998/video/conversion/#introduction. Accessed 3/14/2008.
11
DV Video Data and AVI Files. Microsoft Hardware Development Center Archives.
http://www.microsoft.com/whdc/archive/dvavi.mspx . Accessed 3/17/2008.
12
A Quick Guide to Digital Video Resolution and Aspect Ratio Conversions.
http://lipas.uwasa.fi/~f76998/video/conversion/#introduction . Accessed 3/19/2008.
13
DV, DVCAM, and DVCPRO formats. Adam Wilt. http://www.adamwilt.com/DV-tech.html. accessed
3/17/2008.
14
DVCAM Format Overview. Sony Corporation.
www.sony.ca/dvcam/pdfs/dvcam%20format%20overview.pdf Accessed 3/20/2008.
15
Understanding HD Formats. Waggoner, B. Microsoft Corp.
http://www.microsoft.com/windows/windowsmedia/howto/articles/UnderstandingHDFormats.aspx .
Accessed 3/20/2008.
16
HD Recording Formats Compared. Video Experts. http://videoexpert.home.att.net/artic3/262ctab.htm .
Accessed 3/20/2008.
17
Video-Camera Recording Format and Resolution Comparison. http://www.martin-
doppelbauer.de/video/indexEN.htm . Accessed 3/20/2008.
18
Uploading Videos to YouTube. YouTube Help Center.
http://www.google.com/support/youtube/bin/topic.py?topic=10524 .
19
Digital Age Enterprise – YouTube, LLC. PowerPoint presentation published on Scribd.
http://www.scribd.com/docinfo/2229635?access_key=key-12g6ew1d5015wl1t80eq . Accessed 03/20/2008.
20
YouTube Statistics. Digital Ethnography – Blog Archive. Professor Michael Welsch.
http://mediatedcultures.net/ksudigg/?p=163 . Accessed 3/24/2008.

Web Application Class 12
100% (2)
Web Application Class 12
288 pages
Video Conferencing Ebook
100% (1)
Video Conferencing Ebook
39 pages
Multimedia Slides 01
No ratings yet
Multimedia Slides 01
82 pages
Multimedia System Architecture
No ratings yet
Multimedia System Architecture
9 pages
Multimedia Video
100% (3)
Multimedia Video
69 pages
Chapter 1-Distributed Multimedia
No ratings yet
Chapter 1-Distributed Multimedia
34 pages
Video (Unit - 4)
No ratings yet
Video (Unit - 4)
7 pages
Chapter No. 1 Requirement Analysis
No ratings yet
Chapter No. 1 Requirement Analysis
21 pages
Video Streaming for Educators
100% (1)
Video Streaming for Educators
6 pages
Television Broadcasting Industry Analysis
No ratings yet
Television Broadcasting Industry Analysis
44 pages
Soumen-Large Scale VoD
No ratings yet
Soumen-Large Scale VoD
9 pages
How To Operate SQ8 Mini DV Camera: SQ8 Spy Camera Manual - BFA: Cool Aliexpress Products
100% (1)
How To Operate SQ8 Mini DV Camera: SQ8 Spy Camera Manual - BFA: Cool Aliexpress Products
23 pages
03 Editing
No ratings yet
03 Editing
87 pages
A Survey On Video Streaming Methods in Multihop Radio Networks
No ratings yet
A Survey On Video Streaming Methods in Multihop Radio Networks
5 pages
Software Requirements Specification For: NETFLIX-Streaming TV and Movie
No ratings yet
Software Requirements Specification For: NETFLIX-Streaming TV and Movie
18 pages
Avstreamingdg RevE
No ratings yet
Avstreamingdg RevE
108 pages
Improving Content Delivery in A Video-on-Demand Over IP Environment
No ratings yet
Improving Content Delivery in A Video-on-Demand Over IP Environment
8 pages
Minor Project: ON "Audio/Video Media Library"
No ratings yet
Minor Project: ON "Audio/Video Media Library"
50 pages
White Paper IPTV's Key Broadcast Building Blocks: Figure 1. H.264 Encoder Block Diagram
No ratings yet
White Paper IPTV's Key Broadcast Building Blocks: Figure 1. H.264 Encoder Block Diagram
3 pages
Krishna Engineering College: 1 (RITESH AGARWAL (0816110067 (2 (ROMIT SRIVASTAVA (0816110070 (
No ratings yet
Krishna Engineering College: 1 (RITESH AGARWAL (0816110067 (2 (ROMIT SRIVASTAVA (0816110070 (
7 pages
Transitioning Broadcast To Cloud
No ratings yet
Transitioning Broadcast To Cloud
23 pages
Streaming Media: by G.Susmitha CVR College of Engineering
No ratings yet
Streaming Media: by G.Susmitha CVR College of Engineering
12 pages
Media Streamer and Player
No ratings yet
Media Streamer and Player
33 pages
Video Streaming: What Is A Video?
No ratings yet
Video Streaming: What Is A Video?
15 pages
Video Streaming395
No ratings yet
Video Streaming395
12 pages
Media Streaming
No ratings yet
Media Streaming
19 pages
Video Streaming Services and The Changing TV Broadcast
No ratings yet
Video Streaming Services and The Changing TV Broadcast
18 pages
Designing Youtube
No ratings yet
Designing Youtube
24 pages
Fourcc Video Tags
No ratings yet
Fourcc Video Tags
12 pages
Netflix Srs
No ratings yet
Netflix Srs
19 pages
Video Streaming
No ratings yet
Video Streaming
8 pages
Tesi
No ratings yet
Tesi
113 pages
5G White Paper On Audiovisual Media Services
No ratings yet
5G White Paper On Audiovisual Media Services
54 pages
Video Streaming
No ratings yet
Video Streaming
10 pages
Dmslecture 9
No ratings yet
Dmslecture 9
12 pages
Producing and Directing The Short Film and Video Third Edition David K. Irving PDF Download
No ratings yet
Producing and Directing The Short Film and Video Third Edition David K. Irving PDF Download
124 pages
Rc389h-W.bausll MFL56842637
No ratings yet
Rc389h-W.bausll MFL56842637
34 pages
Real-Time Multi-Source Video Streaming
No ratings yet
Real-Time Multi-Source Video Streaming
6 pages
Moving To The Media Cloud
No ratings yet
Moving To The Media Cloud
12 pages
Module 8 - Video
No ratings yet
Module 8 - Video
33 pages
Video-On-Demand Platform Using Dynamic Adaptive Streaming Over HTTP (Dash) - Merakiplay
No ratings yet
Video-On-Demand Platform Using Dynamic Adaptive Streaming Over HTTP (Dash) - Merakiplay
6 pages
Service Manual: DCR-HC23E/HC24E/HC26/HC26E/HC35E
No ratings yet
Service Manual: DCR-HC23E/HC24E/HC26/HC26E/HC35E
69 pages
Panasonic NV ds15 Manual de Usuario
No ratings yet
Panasonic NV ds15 Manual de Usuario
69 pages
Internet TV Evolution Explained
No ratings yet
Internet TV Evolution Explained
8 pages
Network AV White Paper 2016 EMEA
No ratings yet
Network AV White Paper 2016 EMEA
16 pages
Pinnacle Studio 9 Ignite 1st Edition Aneesha Bakharia PDF Download
No ratings yet
Pinnacle Studio 9 Ignite 1st Edition Aneesha Bakharia PDF Download
61 pages
2014 An Overview of HTTP Adaptive Streaming Protocols For TV Everywhere Delivery
No ratings yet
2014 An Overview of HTTP Adaptive Streaming Protocols For TV Everywhere Delivery
16 pages
Evolution and Challenges in Multimedia: A. Dan S. I. Feldman D. N. Serpanos
No ratings yet
Evolution and Challenges in Multimedia: A. Dan S. I. Feldman D. N. Serpanos
8 pages
Automatic Test Framework For Video Streaming Quality Assessment
No ratings yet
Automatic Test Framework For Video Streaming Quality Assessment
5 pages
MediaStudio Pro 8 User Guide
No ratings yet
MediaStudio Pro 8 User Guide
308 pages
Aqa 7262 NG Req Performance Evidence
No ratings yet
Aqa 7262 NG Req Performance Evidence
8 pages
Hardware-Assisted, Low-Cost Video Transcoding Solution in Wireless Networks
No ratings yet
Hardware-Assisted, Low-Cost Video Transcoding Solution in Wireless Networks
17 pages
Video and Animation (1) : DMET501 - Introduction To Media Engineering
No ratings yet
Video and Animation (1) : DMET501 - Introduction To Media Engineering
21 pages
Avvasi Confidential: References
No ratings yet
Avvasi Confidential: References
5 pages
CDN and VOD Principles Overview
No ratings yet
CDN and VOD Principles Overview
10 pages
A Transport Layer For Live Streaming in A Content Delivery Network
No ratings yet
A Transport Layer For Live Streaming in A Content Delivery Network
11 pages
Chapter - 2
No ratings yet
Chapter - 2
22 pages
Research Report On Classes of Multimedia Application
No ratings yet
Research Report On Classes of Multimedia Application
21 pages
Introduction To Netflix Streaming
No ratings yet
Introduction To Netflix Streaming
11 pages
Sony DCR-TRV900 Service Manual
No ratings yet
Sony DCR-TRV900 Service Manual
221 pages
MTEK Lect-Wavelet Filt
No ratings yet
MTEK Lect-Wavelet Filt
23 pages
Fundamentals of Multi-Channel: Encoding For Streaming
No ratings yet
Fundamentals of Multi-Channel: Encoding For Streaming
13 pages
Multimedia Server Optimization Guide
No ratings yet
Multimedia Server Optimization Guide
5 pages
MM Mca Mit-1
No ratings yet
MM Mca Mit-1
40 pages
t189 Mini DV Pen User Manual
No ratings yet
t189 Mini DV Pen User Manual
5 pages
PXW-Z190 4-Page Brochure V2 English Oct2018
No ratings yet
PXW-Z190 4-Page Brochure V2 English Oct2018
4 pages
Sony Dcr-Hc47e hc48 Hc48e
No ratings yet
Sony Dcr-Hc47e hc48 Hc48e
81 pages
Comp
No ratings yet
Comp
2 pages
Panasonic AJ-HPX2100E Manual
No ratings yet
Panasonic AJ-HPX2100E Manual
194 pages
Boosting Trust in AI Recommendations
No ratings yet
Boosting Trust in AI Recommendations
16 pages
How It Work
No ratings yet
How It Work
7 pages
Bombay HC Quashes Domestic Violence Case Against A Married Sister in Law
No ratings yet
Bombay HC Quashes Domestic Violence Case Against A Married Sister in Law
18 pages
History
No ratings yet
History
62 pages
Video Production Lab Manual 2021-2024
No ratings yet
Video Production Lab Manual 2021-2024
71 pages
Digital Camcorder User Guide
No ratings yet
Digital Camcorder User Guide
68 pages
Video Production Equipment Guide
No ratings yet
Video Production Equipment Guide
33 pages
Manfrotto Catalogo 2005
No ratings yet
Manfrotto Catalogo 2005
48 pages
Se 800
No ratings yet
Se 800
2 pages
Minimum Cost Effectiveness For Large Size HD Video Streaming Over Heterogeneous Wireless Networks
No ratings yet
Minimum Cost Effectiveness For Large Size HD Video Streaming Over Heterogeneous Wireless Networks
4 pages
Ece551 Notes Chapter 5
No ratings yet
Ece551 Notes Chapter 5
49 pages
VBoxNews EN
No ratings yet
VBoxNews EN
2 pages
Inout EN
No ratings yet
Inout EN
2 pages
DSR 1500
No ratings yet
DSR 1500
6 pages
GVB 1 0664A EN DS - K2+Summit+3G+
No ratings yet
GVB 1 0664A EN DS - K2+Summit+3G+
5 pages
Aj pd500 Datasheet
No ratings yet
Aj pd500 Datasheet
4 pages

Video Transcoding Architectures

Uploaded by

Video Transcoding Architectures

Uploaded by

Video Transcoding

Spring 2008, Columbia University

Video Distribution System Examples

Live broadcast of internet-provided “traditional” television programming (that is already

User-Generated Content Video Distribution

Digital Video Recorders and Streaming Servers

As an example, consider the Slingbox 2 line of products from Slingmedia, the

Peer-to-Peer (P2P) Video File Sharing Networks:

The introduction of BitTorrent revolutionized downloading of large video files over

Background on Video Codecs

Note that by definition, transcoding is computational more complex than either

Transcoding Hardware Components:

General Purpose Processing

Intel Quad Core Product Website. 4

Graphics Card-Assisted Video Processing

Nvidia’s PureVide HD (along with PureVideo 1 and PureVideo 2) 6 technologies also

Custom Hardware (DSPs, FPGAs, and ASICs)

Consumer Electronics Components:

The available computational power of consumer electronics can vary dramatically.

Hardware Components Summary

Video Distribution System

Figure 1 Decomposition of Video Distribution Systems

Example Distribution Architectures

Figure 2 Demonstrates Capabilities of Personal DVR based Video Distribution Server 9

User Generated Video Hosting System Architecture

Standard Definition (DV):

DV specifies a relatively simple DCT-based video compression operating at a fixed

YouTube Video Specifications:

YouTube Usage Statistics:

Required Transcoding Capacity:

Video Capture Method:

Memory Consumption and Initial Transcoding Benchmarks:

YouTube Input H.263 FLV

You might also like