Data Storage & Data
Representation
Representing Text
• Each character (letter, punctuation, etc.) is assigned a unique bit
pattern.
• ASCII: Uses patterns of 7-bits to represent most symbols used inwritten
English text
• Unicode: Uses patterns of 16-bits torepresent the major symbols used in
languages world side
• ISO standard: Uses patterns of 32-bits to represent most symbols usedin
languages world wide
• ISO : International Organization forStandardization
• ASCII: American Standard Code for Information Interchange
The message “Hello.” in ASCII
Representing Numeric Values
• Storing information in terms of encoded characters isinefficient.
• storing the value 25. If we insist on storing it as encoded symbols in ASCII
using one byte per symbol, we need a total of16 bits.
• Moreover, thelargest number we could store using 16 bits is 99.
• However, by using binary notation we can store any integer in the range from
0 to 65535 in these 16bits.
• Binary notation: Uses bits to represent a number in basetwo
• Limitations of computer representations of numericvalues
• Overflow – occurs when a value is too big to berepresented
• Truncation – occurs when a value cannot be represented
accurately
Representing Images
• Bit map techniques
• Pixel: short for “picture element”
• Binary Image
• Grayscale color : is a range of shades of gray
without apparent color.
• The darkest possible shade is black, which is the total
absence of transmitted or reflected light. The lightest
possible shade is white.
• RGB color : is an additive color model in which
red, green and blue light are added together in
various ways to reproduce a broad array of colors.
disadvantage of representing images as bit maps
• An image cannot be rescaled easily to any
arbitrary size. Essentially, the only way to
enlarge the image is to make the pixels
bigger, which leads to a grainy appearance.
• (This is the technique called “digital zoom”
used in digital cameras as opposed to
“optical zoom” that is obtained by adjusting
the camera lens.)
• Vector techniques
• is to describe the image as a collection of
geometric structures, such aslines and curves.
• This is the approach used to produce the
scalable fonts that are available via today’s word
processing systems.
Representing Videos
• videos can be encoded as series of image frames with synchronized
audio tracks also encoded using bits.
• Suppose you have a
• 10 minute video, 256 x 256 pixels,
• 24 bits per pixel, and 30 frames of the video per second.
• You use an encoding that stores all bits for each pixel for each frame
in the video. What is the total file size?
• the answer to the file size question is (256x256) pixels * 24 bits/pixel
* 10 minutes * 60 seconds/minute * 30 frames per second
= approximately 28 Gb
Representing Sound
• Is recording the series of valuesobtained.
• the series 0, 1.5, 2.0, 1.5, 2.0, 3.0, 4.0, 3.0, 0 would represent a sound wave
that rises in amplitude, falls briefly, rises to a higher level, and then drops
back to 0 (Figure 1.14).
• sample rate of 8000 samples per second, has been used for years in long-
distance voice telephone communication.
• 8000 samples per second may seem to be a rapid rate, but it is not sufficient
for high-fidelity musicrecordings.
• today’s musical CDs, a sample rate of 44,100 samples per second isused.
• The data obtained from each sample are represented in 16 bits (32 bitsfor
stereo recordings).
• each second of music recorded in stereo requires more than a millionbits.
The sound wave represented by the sequence
0, 1.5, 2.0, 1.5, 2.0, 3.0, 4.0, 3.0,0
Representing Sound(continue)
• MIDI : Musical Instrument Digital Interface
• MIDI encoding directions for producing music on a synthesizer
rather than encoding the sounditself.
• is widely used in in electronic keyboards, for video
game sound, and for sound effects accompanying Web sites.
• Clarinet playing the note D for two seconds can be
encoding in three bytes rather than more than two
million bits when sampled at a rate of 44,100 samples
per second.
Squidward play Clarinet
Data Compression
• For the purpose of storing or transferring data, it is often
helpful (and sometimes mandatory) to reduce the size of the
data involved while retaining the underlyinginformation.
• The technique for accomplishing this is called data
compression.
Data Compression (continue)
• Lossy schemes are those that may lead to the loss of information.
• Lossless schemes are those that do not lose information in the
compression process.
• Lossy techniques often provide more compression than lossless ones
and are therefore popular in settings in which minor errors can be
tolerated, asin the case of images and audio.
• Relative encoding
• Dictionary encoding (Includes adaptive dictionary encoding such as
LZW encoding.)
Run-length encoding
• In cases where the data being compressed consist of long sequences
of the same value, the compression technique called run-length
encoding, which is a lossless method, is popular.
• It is the process of replacing sequences of identical data elements
with a code indicating the element that is repeated and the number
of times it occurs in the sequence.
• For example, less space is required to indicate that a bit pattern
consists of 253 ones, followed by 118 zeros, followed by 87 ones than
to actually list all 458bits.
Frequency-dependent encoding
• Another lossless data compression technique is frequency-
dependent encoding.
• In the English language the letters e, t, a, and I are used more
frequently than the letters z, q, andx.
• So, when constructing a code for text in the English language, space
can be saved by using short bit patterns to represent the former
letters and longer bit patterns to represent the latter ones.
Relative Encoding
• In some cases, the stream of data to be compressed consists of units,
each of which differs only slightly from thepreceding one.
• An example would be consecutive frames of a motion picture. In
these cases, techniques using relative encoding, also known as
differential encoding, are helpful.
• These techniques record the differences between consecutive data
units rather than entire units; that is, each unit is encoded in terms of
its relationship to the previousunit.
• Relative encoding can be implemented in either lossless or lossy form
depending on whether the differences between consecutive data
units are encoded precisely or approximated.
Compressing Images GIF
• GIF (Graphic Interchange Format)
• reducing the number of colors that can be assigned to a pixel to only 256.
• The red-green-blue combination for each of these colors is encoded using
three bytes, and these 256 encodings are stored in a table (a dictionary)
called the palette. Each pixel in an image can then be represented by a
single byte whose value indicates which of the 256 palette entries
represents the pixel’scolor.
• GIF is a lossy compression system when applied to arbitrary images
because the colors in the palette may not be identical to the colors in the
original image.
• Good for cartoons & animations
Compressing Images JPEG
• JPEG:Joint Photographic Experts Group
• Good for photographs & use by mostcameras.
• The JPEGstandard actually encompasses several methods of image
compression, each with its owngoals.
• JPEGprovides a lossless mode. However, JPEG’slossless mode does not
produce high levels of compression when compared to other JPEGoptions.
• JPEG’sbaseline standard (also known as JPEG’slossy sequential mode) has
become the standard of choice inmany applications.
• JPEG’s baseline standard takes advantage of a human eye’s limitations. In
particular, the human eye is more sensitive to changes in brightness than
to changes in color..
Compressing Video
• MPEG (Motion Picture Experts Group)
• High definition television broadcast
• Video conferencing
• video being constructedas a sequence of pictures. To compress such
sequences, only some of the pictures, called I-frames, are encoded in their
entirety.
• The pictures between the I-frames are encoded using relative encoding
techniques. That is, rather than encode the entire picture, only its
distinctions from the priorimage are recorded.
• The I-frames themselves are usually compressed with techniques similar to
JPEG
Compressing Audio
• MP3
• The best known system for compressing audio is MP3, which was
developed within the MPEG standards. In fact, the acronym MP3 is short
for MPEG layer3.
• Temporal masking
• Frequency masking
• MP3 takes advantage of the properties of the human ear, removing those
details thatthe human ear cannot perceive, such as:
• Temporal masking, is that for a short period after a loud sound, the human
ear cannot detect softer sounds that would otherwise be audible.
• Frequency masking, is that a sound at one frequency tends to mask softer
sounds at nearby frequencies.