DISTRIBUTED MULTIMEDIA SYSTEMS
Lecture 3: Audio Fundamentals
3.1 Introduction
Welcome to the third lecture on Multimedia data. Sound is perhaps the most important
element of multimedia. It is meaningful “speech” in any language, from a whisper to a
scream. It can provide the listening pleasure of music, the startling accent of special
effects or the ambience of a mood setting background. Sound is the terminology used
in the analog form, and the digitized form of sound is called as audio.
3.2 Lecture objectives
a)Distinguish audio and sound
b) Prepare audio required for a multimedia system
c) The learner will be able to list the different audio editing softwares.
d) List the different audio file formats
3.3 Lecture outline
Power of Sound
When something vibrates in the air is moving back and forth it creates wave of
pressure. These waves spread like ripples from pebble tossed into a still pool and when
it reaches the eardrums, the change of pressure or vibration is experienced as sound.
Acoustics is the branch of physics that studies sound. Sound pressure levels are
measured in decibels (db); a decibel measurement is actually the ratio between a chosen
reference point on a logarithmic scale and the level that is actually experienced.
Multimedia Sound Systems
The multimedia application user can use sound right off the bat on both the Macintosh
and on a multimedia PC running Windows because beeps and warning sounds are
available as soon as the operating system is installed. On the Macintosh you can choose
one of the several sounds for the system alert. In Windows system sounds are WAV files
and they reside in the windows\Media subdirectory. There are still more choices of audio
if Microsoft Office is installed. Windows makes use of WAV files as the default file
format for audio and Macintosh systems use SND as default file format for audio.
18
DISTRIBUTED MULTIMEDIA SYSTEMS
Digital Audio
Digital audio is created when a sound wave is converted into numbers – a process
referred to as digitizing. It is possible to digitize sound from a microphone, a synthesizer,
existing tape recordings, live radio and television broadcasts, and popular CDs. You can
digitize sounds from a natural source or prerecorded. Digitized sound is sampled sound.
Ever nth fraction of a second, a sample of sound is taken and stored as digital information
in bits and bytes. The quality of this digital recording depends upon how often the
samples are taken.
Preparing Digital Audio Files
Preparing digital audio files is fairly straight forward. If you have analog source materials
– music or sound effects that you have recorded on analog media such as cassette tapes.
The first step is to digitize the analog material and recording it onto a computer
readable digital media.
It is necessary to focus on two crucial aspects of preparing digital audio files:
-Balancing the need for sound quality against your available RAM and Hard disk
resources.
- Setting proper recording levels to get a good, clean recording.
Remember that the sampling rate determines the frequency at which samples will be
drawn for the recording. Sampling at higher rates more accurately captures the high
frequency content of your sound. Audio resolution determines the accuracy with which a
sound can be digitized.
Formula for determining the size of the digital audio
Monophonic = Sampling rate * duration of recording in seconds * (bit resolution / 8) * 1
Stereo = Sampling rate * duration of recording in seconds * (bit resolution / 8) * 2
The sampling rate is how often the samples are taken.
The sample size is the amount of information stored. This is called as bit resolution.
19
DISTRIBUTED MULTIMEDIA SYSTEMS
The number of channels is 2 for stereo and 1 for monophonic.
The time span of the recording is measured in seconds.
Digitization
Digitization means conversion to a stream of numbers, and preferably these
numbers should be integers for efficiency.
An analog signal: continuous measurement of pressure wave.
To digitize, the signal must be sampled in each dimension: in time, and in amplitude.
• Sampling means measuring the quantity we are interested in, usually at evenly-
spaced intervals.
• The first kind of sampling, using measurements only at evenly spaced time
intervals, is simply called, sampling. The rate at which it is performed is called the
sampling frequency.
• For audio, typical sampling rates are from 8 kHz (8,000 samples per second) to
48 kHz. This range is determined by the Nyquist theorem, discussed later.
20
DISTRIBUTED MULTIMEDIA SYSTEMS
Sound is a continuous signal (measurement of pressure). Sampling in the amplitude
or voltage dimension is called quantization. We quantize so that we can represent the
signal as a discrete set of values.
Whereas frequency is an absolute measure, pitch is generally relative — a
perceptual subjective quality of sound.
Pitch and frequency are linked by setting the note A above middle C to exactly 440
Hz.
An octave above that note takes us to another A note. An octave corresponds to
doubling the frequency. Thus with the middle “A” on a piano (“A4” or “A440”) set
to 440 Hz, the next “A” up is at 880 Hz, or one octave above.
Harmonics: any series of musical tones whose frequencies are integral multiples of
the frequency of a fundamental tone.
If we allow non-integer multiples of the base frequency, we allow non-“A” notes and
have a more complex resulting sound.
The Nyquist theorem states how frequently we must sample in time to be able to recover
the original sound.
• Fig. 6.4(a) shows a single sinusoid: it is a single, pure, frequency (only
electronic instruments can create such sounds).
• If sampling rate just equals the actual frequency, Fig. 6.4(b) shows that a false signal
is detected: it is simply a constant, with zero frequency.
• Now if sample at 1.5 times the actual frequency, Fig. 6.4(c) shows that we obtain
an incorrect (alias) frequency that is lower than the correct one — it is half the correct
one (the wavelength, from peak to peak, is double that of the actual signal).
• Thus for correct sampling we must use a sampling rate equal to at least twice
the maximum frequency content in the signal. This rate is called the Nyquist rate.
21
DISTRIBUTED MULTIMEDIA SYSTEMS
Nyquist Theorem: If a signal is band-limited, i.e., there is a lower limit f1 and an upper
limit f2 of frequency components in the signal, then the sampling rate should be at least
2(f2 − f1).
• Nyquist frequency: half of the Nyquist rate.
– Since it would be impossible to recover frequencies higher than Nyquist frequency in
any event, most systems have an anti-aliasing filter that restricts the frequency content
in the input to the sampler to a range at or below Nyquist frequency.
Aliasing
The relationship among the Sampling Frequency,
True Frequency, and the Alias Frequency is as follows:
falias = fsampling − ftrue, for ftrue < fsampling < 2 × ftrue
If true freq is 5.5 kHz and sampling freq is 8 kHz.
Then what is the alias freq?
22
DISTRIBUTED MULTIMEDIA SYSTEMS
Signal to Noise Ratio (SNR)
The ratio of the power of the correct signal and the noise is called the signal to noise ratio
(SNR) — a measure of the quality of the signal.
• The SNR is usually measured in decibels (dB), where 1 dB is a tenth of a bel. The
SNR value, in units of dB, is defined in terms of base-10 logarithms of squared
amplitudes, as follows:
2
V V
SNR 10 log signal
2 20 log signal
10 V 10 V
noise noise
For example, if the signal amplitude Asignal is 10 times the noise, then the SNR is 20 ∗ log10(10) = 20dB.
1. dB always defined in terms of a ratio.
The usual levels of sound we hear around us are described in terms of decibels, as a
ratio to the quietest sound we are capable of hearing. Table 6.1 shows approximate
levels for these sounds.
Threshold of hearing 0
Rustle of leaves 10
Very quiet room 20
Average room 40
Conversation 60
Busy street 70
Loud radio 80
Train through station 90
Riveter 100
23
DISTRIBUTED MULTIMEDIA SYSTEMS
Threshold of discomfort 120
Threshold of pain 140
Damage to ear drum 160
Merits of db
The decibel's logarithmic nature means that a very large range of ratios can be
represented by a convenient number. This allows one to clearly visualize huge changes
of some quantity.
1. The mathematical properties of logarithms mean that the overall decibel gain of a multi-
component system (such as consecutive amplifiers) can be calculated simply by summing
the decibel gains of the individual components, rather than needing to multiply amplification
factors. Essentially this is because log(A × B × C × ...) = log(A) + log(B) +
log(C) + …
* The human perception of sound is such that a doubling of actual intensity causes
perceived intensity to always increase by the same amount, irrespective of the original
level. The decibel's logarithmic scale, in which a doubling of power or intensity
always causes an increase of approximately 3 dB, corresponds to this perception.
Signal to Quantization Noise Ratio (SQNR)
• Aside from any noise that may have been present in the original analog signal, there is
also an additional error that results from quantization.
(a) If voltages are actually in 0 to 1 but we have only 8 bits in which to store values, then
effectively we force all continuous values of voltage into only 256 different values.
(b) This introduces a round off error. It is not really “noise”. Nevertheless it is called
quantization noise (or quantization error).
The quality of the quantization is characterized by the Signal to Quantization Noise Ratio
(SQNR).
24
DISTRIBUTED MULTIMEDIA SYSTEMS
(a) Quantization noise: the difference between the actual value of the analog
signal, for the particular sampling time, and the nearest quantization
interval value.
(b) At most, this error can be as much as half of the interval.
(c) For a quantization accuracy of N bits per sample, the SQNR can be simply expressed:
V
signal N 1
SQNR 20log10 V 20log10 2 1
quan _ noise
2
20 N log 2 6.02 N (dB)
Editing Digital Recordings
Once a recording has been made, it will almost certainly need to be edited. The
basic sound editing operations that most multimedia procedures needed are
described in the paragraphs that follow
1. Multiple Tasks: Able to edit and combine multiple tracks and then merge the
tracks and export them in a final mix to a single audio file.
2. Trimming: Removing dead air or blank space from the front of a recording
and an unnecessary extra time off the end is your first sound editing task.
3. Splicing and Assembly: Using the same tools mentioned for trimming, you will
probably want to remove the extraneous noises that inevitably creep into recording.
4. Volume Adjustments: If you are trying to assemble ten different recordings into
a single track there is a little chance that all the segments have the same volume.
5. Format Conversion: In some cases your digital audio editing software might
read a format different from that read by your presentation or authoring program.
6. Resampling or down sampling: If you have recorded and edited your sounds at
16 bit sampling rates but are using lower rates you must resample or down
sample the file.
7. Equalization: Some programs offer digital equalization capabilities that allow
25
DISTRIBUTED MULTIMEDIA SYSTEMS
you to modify a recording frequency content so that it sounds brighter or darker.
8. Digital Signal Processing: Some programs allow you to process the signal with
reverberation, multitap delay, and other special effects using DSP routines.
9. Reversing Sounds: Another simple manipulation is to reverse all or a portion of a
digital audio recording. Sounds can produce a surreal, other wordly effect when
played backward.
10. Time Stretching: Advanced programs let you alter the length of a sound
file without changing its pitch. This feature can be very useful but watch out:
most time stretching algorithms will severely degrade the audio quality.
Making MIDI Audio
MIDI (Musical Instrument Digital Interface) is a communication standard developed for
electronic musical instruments and computers. MIDI files allow music and sound
synthesizers from different manufacturers to communicate with each other by sending
messages along cables connected to the devices. Creating your own original score can be
one of the most creative and rewarding aspects of building a multimedia project, and
MIDI (Musical Instrument Digital Interface) is the quickest, easiest and most flexible tool
for this task. The process of creating MIDI music is quite different from digitizing
existing audio. To make MIDI scores, however you will need sequencer software and a
sound synthesizer. The MIDI keyboard is also useful to simply the creation of musical
scores. An advantage of structured data such as MIDI is the ease with which the music
director can edit the data. A MIDI file format is used in the following circumstances:
Digital audio will not work due to memory constraints and more processing
power requirements
When there is high quality of MIDI source
When there is no requirement for dialogue.
A digital audio file format is preferred in the following circumstances:
When there is no control over the playback hardware
When the computing resources and the bandwidth requirements are high.
When dialogue is required.
26
DISTRIBUTED MULTIMEDIA SYSTEMS
Audio File Formats
A file format determines the application that is to be used for opening a file. Following
is the list of different file formats and the software that can be used for opening a
specific file.
1. *.AIF, *.SDII in Macintosh Systems
2. *.SND for Macintosh Systems
3. *.WAV for Windows Systems
4. MIDI files – used by north Macintosh and Windows
5. *.WMA –windows media player
6. *.MP3 – MP3 audio
7. *.RA – Real Player
8. *.VOC – VOC Sound
9. AIFF sound format for Macintosh sound files
10. *.OGG – Ogg Vorbis
Red Book Standard
The method for digitally encoding the high quality stereo of the consumer CD music
market is an instrument standard, ISO 10149. This is also called as RED BOOK standard.
The developers of this standard claim that the digital audio sample size and sample rate
of red book audio allow accurate reproduction of all sounds that humans can hear. The
red book standard recommends audio recorded at a sample size of 16 bits and sampling
rate of 44.1 KHz.
Software used for Audio
Software such as Toast and CD-Creator from Adaptec can translate the digital files of red
book Audio format on consumer compact discs directly into a digital sound editing file,
or decompress MP3 files into CD-Audio. There are several tools available for recording
audio. Following is the list of different software that can be used for recording and
editing audio;
27
DISTRIBUTED MULTIMEDIA SYSTEMS
Soundrecorder from Microsoft
Apple’s QuickTime Player pro
Sonic Foundry’s Sound Forge for Windows
Soundedit16
3.4 End of lecture activities (self –tests)
1. Record an audio clip using sound recorder in Microsoft Windows for 1 minute.
Note down the size of the file. Using any audio compression software converts the
recorded file to MP3 format and compare the size of the audio.
2. (a) Why is file or data compression necessary for Multimedia activities?
3. (b) Briefly explain how the Discrete Cosine Transform Operates, and why is it so
important in data compression in Multimedia applications
4. (a) What are the differences between analog signals and digital signals?
5. Audio signals are often sampled at different rates. CD quality audio is sampled at
44.1kHz rate while telephone quality audio sampled at 8kHz. What are the
maximum frequencies in the input signal that can be fully recovered for these two
sampling rates? Briefly describe the theory you use to obtain the results?
6. (a) What is MIDI?
(b) What features of MIDI make it suitable for use in the MPEG-4
audio compression standard?
(c) Briefly outline the MPEG-4 structured audio standard.
(d) What features of MIDI make it suitable for controlling software or hardware
devices?
(e) With relation to controlling devices, what limitations does MIDI have in terms
of the level of control, the number of devices and the number of independent
control items within a device? Suggest a solution that can employed to remedy
each of these problems using standard MIDI devices.