Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
28 views176 pages

Basics of Sound and Hearing: Author

The document provides an overview of sound and hearing, explaining how sound waves are created, travel through different media, and are perceived by the human ear. It discusses the mechanics of sound waves, their speed in various materials, and factors affecting sound such as temperature and density. Additionally, it covers the anatomy of the human ear and the concepts of intensity and pitch in relation to sound perception.

Uploaded by

hameedu4545
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views176 pages

Basics of Sound and Hearing: Author

The document provides an overview of sound and hearing, explaining how sound waves are created, travel through different media, and are perceived by the human ear. It discusses the mechanics of sound waves, their speed in various materials, and factors affecting sound such as temperature and density. Additionally, it covers the anatomy of the human ear and the concepts of intensity and pitch in relation to sound perception.

Uploaded by

hameedu4545
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 176

BASICS OF SOUND AND HEARING

Author :
Prof. Ibrahim ELnoshokaty
BASICS OF SOUND AND HEARING

Author :
Prof. Ibrahim ELnoshokaty
Introduction

Everyday your world is filled with a multitude of sounds. Sound can let
you communicate with others or let others communicate with you. It can
be a warning of danger or simply an enjoyable experience. Some sounds
can be heard by dogs or other animals but cannot be heard by humans. The
ability the hear is definitely an important sense, but people who are deaf are
remarkable in the ways that they can compensate for their loss of hearing
All of the sounds you can hear from plucking the strings above occur
because mechanical energy produced by your computer speaker was trans-
ferred to your ear through the movement of atomic particles. Sound is a
pressure disturbance that moves through a medium in the form of mechani-
cal waves. When a force is exerted on an atom, it moves from its rest or equi-
librium position and exerts a force on the adjacent particles. These adjacent
particles are moved from their rest position and this continues throughout
the medium. This transfer of energy from one particle to the next is how
sound travels through a medium. The words “mechanical wave” are used to
describe the distribution of energy through a medium by the transfer of en-
ergy from one particle to the next. Waves of sound energy move outward in

5
all directions from the source. Your vocal chords and the strings on a guitar
are both sources which vibrate to produce sound waves. Without energy,
there would be no sound. Let’s take a closer look at sound waves.
Sound or pressure waves are made up of compressions and rarefactions.
Compression happens when particles are forced, or pressed, together. Rar-
efaction is just the opposite, it occurs when particles are given extra space
and allowed to expand. Remember that sound is a type of kinetic energy.
As the particles are moved from their rest position, they exert a force of
the adjacent particles and pass the kinetic energy. Thus sound energy trav-
els outward from the source.Sound travels through air, water, or a block
of steel; thus, all are mediums for sound. Without a medium there are no
particles to carry the sound waves. The word “particle” suggests a tiny con-
centration of matter capable of transmitting energy. A particle could be an
atom or molecule. In places like space, where there is no atmosphere, there
are too few atomic particles to transfer the sound energy.Let’s look at the
example of a stereo speaker. To produce sound, a thin surfaced cone, called
a diaphragm, is caused to vibrate using electromagnetic energy. When the
diaphragm moves to the right, its energy pushes the air molecules on the
right together, opening up space for the molecules on the left to move into.
We call the molecules on the right compressed and the molecules on the
left rarefied. When the diaphragm moves to the left, the opposite happens.
Now, the molecules to the left become compressed and the molecules to the
right are rarefied. These alternating compressions and rarefactions produce
a wave. One compression and one rarefaction is called a wavelength. Differ-
ent sounds have different wavelengths.

6
As the diaphragm vibrates back and forth, the sound waves produced
move the same direction (left and right). Waves that travel in the same direc-
tion as the particle movement are called longitudinal waves. Longitudinal
sound waves are the easiest to produce and have the highest speed. However,
it is possible to produce other types. Waves which move perpendicular to
the direction particle movement are called shear waves or transverse waves.
Shear waves travel at slower speeds than longitudinal waves, and can only
be made in solids. Think of a stretched out slinky, you can create a longitu-
dinal wave by quickly pushing and pulling one end of the slinky. This causes
longitudinal waves for form and propagates to the other end. A shear wave
can be created by taking one end of the slinky and moving it up and down.
This generates a wave that moves up and down as it travels the length of the
slinky. Another type of wave is the surface wave. Surface waves travel at the
surface of a material with the particles move in elliptical orbits. They are
slightly slower than shear waves and fairly difficult to make. A final type
of sound wave is the plate wave. The particles of these waves also move in
elliptical orbits but plate waves can only be created in very thin pieces of
material

7
Sound and speed

If you have ever been to a baseball game or sat far away from the stage
during a concert, you may have noticed something odd. You saw the batter
hit the ball, but did not hear the crack of the impact until a few seconds
later. Or, you saw the drummer strike the drum, but it took an extra mo-
ment before you heard it. This is because the speed of sound is slower than
the speed of light, which we are used to seeing. The same thing is at work
during a thunderstorm. Lightning and thunder both happen at the same
time. We see the lightning almost instantaneously, but it takes longer to hear
the thunder. Based on how much longer it takes to hear thunder tells us how
far away the storm is. The longer it takes to hear the thunder, the farther the
distance its sound had to travel and the farther away the storm is.

8
Sound and speed

If you have ever been to a baseball game or sat far away from the stage
during a concert, you may have noticed something odd. You saw the batter
hit the ball, but did not hear the crack of the impact until a few seconds
later. Or, you saw the drummer strike the drum, but it took an extra mo-
ment before you heard it. This is because the speed of sound is slower than
the speed of light, which we are used to seeing. The same thing is at work
during a thunderstorm. Lightning and thunder both happen at the same
time. We see the lightning almost instantaneously, but it takes longer to hear
the thunder. Based on how much longer it takes to hear thunder tells us how
far away the storm is. The longer it takes to hear the thunder, the farther the
distance its sound had to travel and the farther away the storm is.The flash
of light from lightning travels at about 300,000 kilometers per second or
186,000 miles per second. This is why we see it so much sooner than we hear
the thunder. If lightning occurs a kilometer away, the light arrives almost
immediately (1/300,000 of a second) but it takes sound nearly 3 seconds
to arrive. If you prefer to think in terms of miles, it takes sound nearly 5
seconds to travel 1 mile. Next time you see lightning count the number of

9
seconds before the thunder arrives, then divide this number by 5 to find out
how far away the lightning is
You are in a long mining tunnel deep under the earth. You have a friend
that is several thousands of feet away from you in the tunnel. You tell this
person using a walkie talkie to yell and clang on the pipes on the tunnel
floor at the same time. Press the play button below to find out what happens.

Material Speed of Sound


Rubber 747 m/s 60
Air at 40oC m/s 355
Air at 20 oC m/s 343
Lead m/s 1210
Gold m/s 3240
Glass m/s 4540
Copper m/s 4600
Aluminum m/s 6320

The speed of sound is not always the same. Remember that sound is a
vibration of kinetic energy passed from molecule to molecule. The closer
the molecules are to each other and the tighter their bonds, the less time
it takes for them to pass the sound to each other and the faster sound can
travel. It is easier for sound waves to go through solids than through liquids
because the molecules are closer together and more tightly bonded in solids.
Similarly, it is harder for sound to pass through gases than through liquids,
because gaseous molecules are farther apart. The speed of sound is faster in

10
solid materials and slower in liquids or gases. The velocity of a sound wave is
affected by two properties of matter: the elastic properties and density. The
relationship is described by the following equation.

Where: C¬¬ij is the elastic properties and p is the density.


Elastic Properties
The speed of sound is also different for different types of solids, liquids,
and gases. One of the reasons for this is that the elastic properties are dif-
ferent for different materials. Elastic properties relate to the tendency of a
material to maintain its shape and not deform when a force is applied to it.
A material such as steel will experience a smaller deformation than rubber
when a force is applied to the materials. Steel is a rigid material while rubber
deforms easily and is a more flexible material.
At the particle level, a rigid material is characterized by atoms and/or
molecules with strong forces of attraction for each other. These forces can be
thought of as springs that control how quickly the particles return to their
original positions. Particles that return to their resting position quickly
are ready to move again more quickly, and thus they can vibrate at higher
speeds. Therefore, sound can travel faster through mediums with higher
elastic properties (like steel) than it can through solids like rubber, which
have lower elastic properties.

11
The phase of matter has a large impact upon the elastic properties of
a medium. In general, the bond strength between particles is strongest in
solid materials and is weakest in the gaseous state. As a result, sound waves
travel faster in solids than in liquids, and faster in liquids than in gasses.
While the density of a medium also affects the speed of sound, the elastic
properties have a greater influence on the wave speed.

12
Density

The density of a medium is the second factor that affects the speed of
sound. Density describes the mass of a substance per volume. A substance
that is more dense per volume has more mass per volume. Usually, larger
molecules have more mass. If a material is more dense because its molecules
are larger, it will transmit sound slower. Sound waves are made up of kinetic
energy. It takes more energy to make large molecules vibrate than it does
to make smaller molecules vibrate. Thus, sound will travel at a slower rate
in the more dense object if they have the same elastic properties. If sound
waves were passed through two materials with approximately the same elas-
tic properties such as aluminum (10 psi) and gold (10.8 psi), sound will
travel about twice as fast in the aluminum (0.632cm/microsecond) than in
the gold (0.324cm/microsecond). This is because the aluminum has a den-
sity of 2.7gram per cubic cm which is less than the density of gold, which
is about 19 grams per cubic cm. The elastic properties usually have a larger
effect that the density so it is important to both material properties.

13
Air Density and Temperature

Suppose that two volumes of a substance such as air have different densi-
ties. We know the more dense substance must have more mass per volume.
More molecules are squeezed into the same volume, therefore, the mole-
cules are closer together and their bonds are stronger (think tight springs).
Since sound is more easily transmitted between particles with strong bonds
(tight springs), sound travels faster through denser air.
However, you may have noticed from the table above that sound travels
faster in the warmer 40oC air than in the cooler 20oC air. This doesn’t seem
right because the cooler air is more dense. However, in gases, an increase
in temperature causes the molecules to move faster and this account for the
increase in the speed of sound. This will be discussed in more detail on the
next page.

14
Temperature and the speed of sound

Temperature is also a condition that affects the speed of sound. Heat,


like sound, is a form of kinetic energy. Molecules at higher temperatures
have more energy, thus they can vibrate faster. Since the molecules vibrate
faster, sound waves can travel more quickly. The speed of sound in room
temperature air is 346 meters per second. This is faster than 331 meters per
second, which is the speed of sound in air at freezing temperatures.
The formula to find the speed of sound in air is as follows:
v = 331m/s + 0.6m/s/C * T
v is the speed of sound and T is the temperature of the air. One thing to
keep in mind is that this formula finds the average speed of sound for any
given temperature. The speed of sound is also affected by other factors such
as humidity and air pressure.

15
Human ear
The human ear has three main sections, which consist of the outer ear,
the middle ear, and the inner ear. Sound waves enter your outer ear and
travel through your ear canal to the middle ear. The ear canal channels the
waves to your eardrum, a thin, sensitive membrane stretched tightly over
the entrance to your middle ear. The waves cause your eardrum to vibrate.
It passes these vibrations on to the hammer, one of three tiny bones in your
ear. The hammer vibrating causes the anvil, the small bone touching the
hammer, to vibrate. The anvil passes these vibrations to the stirrup, another
small bone which touches the anvil. From the stirrup, the vibrations pass
into the inner ear. The stirrup touches a liquid filled sack and the vibrations
travel into the cochlea, which is shaped like a shell. Inside the cochlea, there
are hundreds of special cells attached to nerve fibers, which can transmit
information to the brain. The brain processes the information from the ear
and lets us distinguish between different types of sounds. As you know, there
are many different sounds. Fire alarms are loud, whispers are soft, sopranos
sing high, tubas play low, every one of your friends has a different voice. The
differences between sounds are caused by intensity, pitch, and tone.

16
Ear and Hearing

17
18
The Tympanic Membrane

The tympanic membrane or “eardrum” receives vibrations traveling up


the auditory canal and transfers them through the tiny ossicles to the oval
window, the port into the inner ear.
The eardrum is some fifteen times larger than the oval window of the
inner ear, giving an amplification of about fifteen compared to a case where
the sound pressure interacted with the oval window alone.
The tympanic membrane is very thin, about 0.1 mm, but it is resilient
and strong.(Zemlin) It is made up of three layers: the outer layer of skin, a
layer of fibrous connective tissue, and a layer of mucous membrane.(Clark
& Martin)

19
Intensity

Sound is a wave and waves have amplitude, or height. Amplitude is a


measure of energy. The more energy a wave has, the higher its amplitude.
As amplitude increases, intensity also increases. Intensity is the amount of
energy a sound has over an area. The same sound is more intense if you
hear it in a smaller area. In general, we call sounds with a higher intensity
louder.We are used to measuring the sounds we hear in loudness. The sound
of your friend yelling is loud, while the sound of your own breathing is
very soft. Loudness cannot be assigned a specific number, but intensity can.
Intensity is measured in decibels.The human ear is more sensitive to high
sounds, so they may seem louder than a low noise of the same intensity.
Decibels and intensity, however, do not depend on the ear. They can be
measured with instruments. A whisper is about 10 decibels while thunder
is 100 decibels. Listening to loud sounds, sounds with intensities above 85
decibels, may damage your ears. If a noise is loud enough, over 120 deci-
bels, it can be painful to listen to. One hundred and twenty decibels is the
threshold of pain

20
Sounds and their Decibels

Source of Sound Decibels


Boeing 747 140
Civil Defense Siren 130
Jack Hammer 120
Rock Concert 110
Lawn Mower 100
Motorcycle 90
Garbage Disposal 80
Vacuum Cleaner 70
Normal Conversation 60
Light Traffic 50
Background Noise 40
Whisper 30

21
Pitch

Pitch helps us distinguish between low and high sounds. Imagine that
a singer sings the same note twice, one an octave above the other. You can
hear a difference between these two sounds. That is because their pitch is
different.Pitch depends on the frequency of a sound wave. Frequency is the
number of wavelengths that fit into one unit of time. Remember that a wave-
length is equal to one compression and one rarefaction. Even though the
singer sang the same note, because the sounds had different frequencies, we
heard them as different. Frequencies are measured in hertz. One hertz is
equal to one cycle of compression and rarefaction per second. High sounds
have high frequencies and low sounds have low frequencies. Thunder has a
frequency of only 50 hertz, while a whistle can have a frequency of 1,000
hertz.The human ear is able to hear frequencies of 20 to 20,000 hertz. Some
animals can hear sounds at even higher frequencies. The reason we cannot
hear dog whistles, while they can, is because the frequency of the whistle is
too high be processed by our ears. Sounds that are too high for us to hear
are called ultrasonic.
Ultrasonic waves have many uses. In nature, bats emit ultrasonic waves
and listen to the echoes to help them know where walls are or to find prey.
Captains of submarines and other boats use special machines that send

22
out and receive ultrasonic waves. These waves help them guide their boats
through the water and warn them when another boat is near.
Pitch = frequency of sound
For example, middle C in equal temperament = 261.6 Hz
Sounds may be generally characterized by pitch, loudness, and quality.
The perceived pitch of a sound is just the ear’s response to frequency, i.e.,
for most practical purposes the pitch is just the frequency. The pitch percep-
tion of the human ear is understood to operate basically by the place theory,
with some sharpening mechanism necessary to explain the remarkably high
resolution of human pitch perception.
The place theory and its refinements provide plausible models for the
perception of the relative pitch of two tones, but do not explain the phenom-
enon of perfect pitch.
The just noticeable difference in pitch is conveniently expressed in cents,
and the standard figure for the human ear is 5 cents.

23
Details About Pitch

Although for most practical purposes, the pitch of a sound can be said
to be simply a measure of its frequency, there are circumstances in which a
constant frequency sound can be perceived to be changing in pitch.
One of most consistently observed “psychoacoustic” effects is that a sus-
tained high frequency sound (>2kHz) which is increased steadily in inten-
sity will be perceived to be rising in pitch, whereas a low frequency sound
(<2kHz) will be perceived to be dropping in pitch. (More detail)
The perception of the pitch of short pulses differs from that of sustained
sounds of the same measured frequency. If a short pulse of a pure tone is
decaying in amplitude, it will be perceived to be higher in pitch than an
identical pulse which has steady amplitude. Interfering tones or noise can
cause an apparent pitch shift.
Further discussion of these and other perceptual aspects of pitch may be
found in Chapter 7 of Rossing, The Science of Sound, 2nd. Ed.

24
Effect of Loudness Changes on
Perceived Pitch

A high pitch (>2kHz) will be perceived to be getting higher if its loud-


ness is increased, whereas a low pitch (<2kHz) will be perceived to be going
lower with increased loudness. Sometimes called “Stevens’s rule” after an
early investigator, this psychoacoustic effect has been extensively investi-
gated.
With an increase of sound intensity from 60 to 90 decibels, Terhardt
found that the pitch of a 6kHz pure tone was perceived to rise over 30 cents.
A 200 Hz tone was found to drop about 20 cents in perceived pitch over the
same intensity change.
Studies with the sounds of musical instruments show less perceived
pitch change with increasing intensity. Rossing reports a perceived pitch
change of around 17 cents for a change from 65 dB to 95 dB. This perceived
change can be upward or downward, depending upon which harmonics are
predominant. For example, if the majority of the intensity comes from har-
monics which are above 2 kHz, the perceived pitch shift will be upward.

25
Perfect Pitch

“Perfect pitch” or “absolute pitch” refers to the ability of some persons to


recognize the pitch of a musical note without any discernable pitch stand-
ard, as if the person can recognize a pitch like the eye discerns the color of
an object. Most persons apparently have only a sense of relative pitch and
can recognize a musical interval, but not an isolated pitch.
Rossing suggests that less than 0.01% of the population appear to be able
to recognize absolute pitches, whereas over 98% of the population can do
the corresponding visual task of recognizing colors with no color standard
present.
See Rossing, 2nd Ed, p122, sec 7.7

26
Tone & Harmonics

Another difference you may have noticed between sounds is that some
sounds are pleasant while others are unpleasant. A beginning violin player
sounds very different than a violin player in a symphony, even if they are
playing the same note. A violin also sounds different than a flute playing
the same pitch. This is because they have a different tone, or sound qual-
ity. When a source vibrates, it actually vibrates with many frequencies at
the same time. Each of those frequencies produces a wave. Sound quality
depends on the combination of different frequencies of sound waves.Im-
agine a guitar string tightly stretched. If we strum it, the energy from our
finger is transferred to the string, causing it to vibrate. When the whole
string vibrates, we hear the lowest pitch. This pitch is called the fundamen-
tal. Remember, the fundamental is really only one of many pitches that the
string is producing. Parts of the string vibrating at frequencies higher than
the fundamental are called overtones, while those vibrating in whole num-
ber multiples of the fundamental are called harmonics. A frequency of two
times the fundamental will sound one octave higher and is called the second
harmonic. A frequency four times the fundamental will sound two octaves
higher and is called the fourth harmonic. Because the fundamental is one
times itself, it is also called the first harmonic.

27
What is the difference between music and noise?
Both music and noise are sounds, but how can we tell the difference?
Some sounds, like construction work, are unpleasant. While others, such as
your favorite band, are enjoyable to listen to. If this was the only way to tell
the difference between noise and music, everyone’s opinion would be differ-
ent. The sound of rain might be pleasant music to you, while the sound of
your little brother practicing piano might be an unpleasant noise. To help
classify sounds, there are three properties which a sound must have to be
musical.A sound must have an identifiable pitch, a good or pleasing quality
of tone, and repeating pattern or rhythm to be music. Noise on the other
hand has no identifiable pitch, no pleasing tone, and no steady rhythm.

28
Loudness
Loudness is not simply sound intensity!

Sound loudness is a subjective term describing the strength of the ear’s


perception of a sound. It is intimately related to sound intensity but can
by no means be considered identical to intensity. The sound intensity must
be factored by the ear’s sensitivity to the particular frequencies contained
in the sound. This is the kind of information contained in equal loudness
curves for the human ear. It must also be considered that the ear’s response
to increasing sound intensity is a “power of ten” or logarithmic relation-
ship. This is one of the motivations for using the decibel scale to measure
sound intensity. A general “rule of thumb” for loudness is that the power
must be increased by about a factor of ten to sound twice as loud. To more
realistically assess sound loudness, the ear’s sensitivity curves are factored
in to produce a phon scale for loudness. The factor of ten rule of thumb can
then be used to produce the sone scale of loudness. In practical sound level
measurement, filter contours such as the A, B, and C contours are used to
make the measuring instrument more nearly approximate the ear.

29
Since “loudness” is a subjective measurement of perception, one must be
careful about how much accuracy you attribute to it. But though ff is much
louder than p in dynamic level, it is not 1000x louder, so one must attempt
to develop a scale of loudness that comes closer to mapping your ear’s per-
ception. The “rule of thumb” for loudness is one way to attempt that.

30
“Rule of Thumb” for Loudness

A widely used “rule of thumb” for the loudness of a particular sound is


that the sound must be increased in intensity by a factor of ten for the sound
to be perceived as twice as loud. A common way of stating it is that it takes 10
violins to sound twice as loud as one violin. Another way to state the rule is
to say that the loudness doubles for every 10 phon increase in the sound loud-
ness level. Although this rule is widely used, it must be emphasized that it is
an approximate general statement based upon a great deal of investigation of
average human hearing but it is not to be taken as a hard and fast rule.

31
Why is it that doubling the sound intensity to the ear does not produce
a dramatic increase in loudness? We cannot give answers with complete
confidence, but it appears that there are saturation effects. Nerve cells have
maximum rates at which they can fire, and it appears that doubling the
sound energy to the sensitive inner ear does not double the strength of the
nerve signal to the brain. This is just a model, but it seems to correlate with
the general observations which suggest that something like ten times the
intensity is required to double the signal from the innner ear.
One difficulty with this “rule of thumb” for loudness is that it is applica-
ble only to adding loudness for identical sounds. If a second sound is widely
enough separated in frequency to be outside the critical band of the first,
then this rule does not apply at all.
While not a precise rule even for the increase of the same sound, the rule
has considerable utility along with the just noticeable difference in sound
intensity when judging the significance of changes in sound level.

32
Adding Loudness

When one sound is produced and another sound is added, the increase
in loudness perceived depends upon its frequency relative to the first sound.
Insight into this process can be obtained from the place theory of pitch per-
ception. If the second sound is widely separated in pitch from the first, then
they do not compete for the same nerve endings on the basilar membrane of
the inner ear. Adding a second sound of equal loudness yields a total sound
about twice as loud. But if the two sounds are close together in frequency,
within a critical band, then the saturation effects in the organ of Corti are
such that the perceived combined loudness is only slightly greater than ei-
ther sound alone. This is the condition which leads to the commonly used
rule of thumb for loudness addition.

33
Critical Band

When two sounds of equal loudness when sounded separately are close
together in pitch, their combined loudness when sounded together will be
only slightly louder than one of them alone. They may be said to be in the
same critical band where they are competing for the same nerve endings
on the basilar membrane of the inner ear. According the the place theory
of pitch perception, sounds of a given frequency will excite the nerve cells
of the organ of Corti only at a specific place. The available receptors show
saturation effects which lead to the general rule of thumb for loudness by
limiting the increase in neural response.
If the two sounds are widely separated in pitch, the perceived loudness of
the combined tones will be considerably greater because they do not overlap
on the basilar membrane and compete for the same hair cells. The phenom-
enon of the critical band has been widely investigated.
Backus reports that this critical band is about 90 Hz wide for sounds
below 200 Hz and increases to about 900 Hz for frequencies around 5000
Hertz. It is suggested that this corresponds to a roughly constant length on
the basilar membrane of length about 1.2 mm and involving some 1300 hair
cells. If the tones are far apart in frequency (not within a critical band), the
combined sound may be perceived as twice as loud as one alone.

34
Critical Band Measurement

For low frequencies the critical band is about 90 Hz wide. For higher
frequencies, it is between a whole tone and 1/3 octave wide.
Center Critical
Freq (Hz band-
width
)(Hz

100 90
200 90
500 110
1000 150
2000 280
5000 700
1000 1200

35
Timbre

Sounds may be generally characterized by pitch, loudness, and quality.


Sound “quality” or “timbre” describes those characteristics of sound which
allow the ear to distinguish sounds which have the same pitch and loud-
ness. Timbre is then a general term for the distinguishable characteristics
of a tone. Timbre is mainly determined by the harmonic content of a sound
and the dynamic characteristics of the sound such as vibrato and the attack-
decay envelope of the sound.
Some investigators report that it takes a duration of about 60 ms to
recognize the timbre of a tone, and that any tone shorter than about 4 ms is
perceived as an atonal click. It is suggested that it takes about a 4 dB change
in mid or high harmonics to be perceived as a change in timbre, whereas
about 10 dB of change in one of the lower harmonics is required.

36
Harmonic Content

The primary contributers to the quality or timbre of the sound of a mu-


sical instrument are harmonic content, attack and decay, and vibrato. For
sustained tones, the most important of these is the harmonic content, the
number and relative intensity of the upper harmonics present in the sound.
Some musical sound sources have overtones which are not harmonics
of the fundamental. While there is some efficiency in characterizing such
sources in terms of their overtones, it is always possible to characterize a
periodic waveform in terms of harmonics - such an analysis is called Fou-
rier analysis. It is common practice to characterize a sound waveform by
the spectrum of harmonics necessary to reproduce the observed waveform.

37
The recognition of different vowel sounds of the human voice is largely
accomplished by analysis of the harmonic content by the inner ear. Their
distinctly different quality is attributed to vocal formants, frequency ranges
where the harmonics are enhanced.

38
Attack and Decay

The primary contributers to the quality or timbre of the sound of a musi-


cal instrument are harmonic content, attack and decay, and vibrato/tremolo.

The illustration above shows the attack and decay of a plucked guitar
string. The plucking action gives it a sudden attack characterized by a rapid
rise to its peak amplitude. The decay is long and gradual by comparison. The
ear is sensitive to these attack and decay rates and may be able to use them
to identify the instrument producing the sound.

This shows the sound envelope of striking a cymbal with a stick. The at-
tack is almost instantaneous, but the decay envelope is very long. The time
period shown is about half a second. The interval shown with the guitar
string above is also about half a second, but since its frequency is much
lower, you can resolve the individual periods in that sound envelope.

39
Vibrato/Tremolo

The primary contributers to the quality or timbre of the sound of a


musical instrument are harmonic content, attack and decay, and vibrato/
tremolo. The ordinary definition of vibrato is “periodic changes in the pitch
of the tone”, and the term tremolo is used to indicate periodic changes in the
amplitude or loudness of the tone. So vibrato could be called FM (frequency
modulation) and tremolo could be called AM (amplitude modulation) of
the tone. Actually, in the voice or the sound of a musical instrument both
are usually present to some extent.
Vibrato is considered to be a desirable characteristic of the human voice
if it is not excessive. It can be used for expression and adds a richness to the
voice. If the harmonic content of a sustained sound from a voice or wind
instrument is reproduced precisely, the ear can readily detect the difference
in timbre because of the absence of vibrato. More realistic synthesized tones
will add some type of vibrato and/or tremolo to produce a more realistic
tone.

40
Above is an amplitude plot of a sustained “ee” vowel sound produced
by a female voice. The periodic amplitude change would be described as
tremolo by the ordinary definition of it. You could also hear pitch variation
along with it, so vibrato was present as well. That is commonly the case. The
period of the amplitude modulation is about 0.17 seconds, or a modulation
frequency of about 5.8 Hz superimposed on a tone of frequency centered
at about 395 Hz. Rough frequency measurements gave frequencies of 392
Hz when the amplitude was high and 399 Hz when the amplitude was low.
It is not known whether or not this kind of variation is typical. Scaling the
amplitude variation gives a range of about 7 dB in intensity associated with
the amplitude modulation.
In his “The Acoustical Foundations of Music”, Ch 11, John Backus com-
ments that voice measurements have shown a pitch variation of a singing
voice some six to seven times per second usually accompanied by an ampli-
tude variation at the same rate. He references Sacerdote.
The comments of Berg and Stork in their book “The Physics of Sound”, 2nd
ed, are very close to what I would conclude from my experience and reading.
“The vibrato of a singer’s voice, for example, aids significantly in distinguish-
ing the voice from other musical sounds. The term ‘vibrato’ in general use
refers not only to periodic changes in pitch, but also to periodic changes in
amplitude, which should more correctly be called tremolo. The ‘diaphragm
vibrato’ of a flute player is close to pure tremolo; the vibrato obtained when
a trombone player wiggles the slide in and out is almost a pure pitch vibrato.
Singing vibrato is actually a mixture of true vibrato and tremolo. Vibrato on a
violin or other string instrument is close to pure pitch vibrato.”

41
Chapter 2
Signal sources
44
45
Geometric Waves

Simple geometric waves are often used in sound synthesis since they
have a rich complement of harmonics. These harmonics can be filtered to
produce a variety of sounds.

Square Wave

The square wave contains


only odd harmonics with
the amplitudes

46
Sawtooth Wave

The sawtooth wave is useful


for synthesis since
it contains all harmonics
in the geometric ratio

Erase Head

Before passing over the record head, a tape in a recorder passes over the
erase head which applies a high amplitude, high frequency AC magnetic
field to the tape to erase any previously recorded signal and to thoroughly
randomize the magnetization of the magnetic emulsion. Typically, the tape
passes over the erase head immediately before passing over the record head.
The gap in the erase head is wider than those in the record head; the
tape stays in the field of the head longer to thoroughly erase any previously
recorded signal.

47
Biasing
High fidelity tape recording requires a high frequency biasing signal to
be applied to the tape head along with the signal to “stir” the magnetization
of the tape and make sure each part of the signal has the same magnetic
starting conditions for recording. This is because magnetic tapes are very
sensitive to their previous magnetic history, a property called hysteresis.

48
A magnetic “image” of a sound signal can be stored on tape in the form
of magnetized iron oxide or chromium dioxide granules in a magnetic
emulsion. The tiny granules are fixed on a polyester film base, but the direc-
tion and extent of their magnetization can be changed to record an input
signal from a tape head.

49
Tape Playback

When a magnetized tape passes under the playback head of a tape re-
corder, the ferromagnetic material in the tape head is magnetized and that
magnetic field penetrates a coil of wire which is wrapped around it. Any
change in magnetic field induces a voltage in the coil according to Faraday’s
law. This induced voltage forms an electrical image of the signal which is
recorded on the tape.

Problem: The magnetization of the magnetic emulsion is proportional


to the recorded signal while the induced voltage in the coil is proportional

50
to the rate at which the magnetization in the coil changes. This means that
for a signal with twice the frequency, the output signal is twice as great for
the same degree of magnetization of the tape. It is therefore necessary to
compensate for this increase in signal to keep high frequencies from being
boosted by a factor of two for each octave increase in pitch. This compensa-
tion process is called equalization.

51
Sound Synthesis

Periodic electric signals can be converted into sound by amplifying them


and driving a loudspeaker with them. One way to do this is to simply add
various amplitudes of the harmonics of a chosen pitch until the desired
timbre is obtained, called additive synthesis. Another way is to start with
geometric waves, which are rich in harmonic content, and filter the harmon-
ics to produce a new sound- subtractive synthesis.
Modern sound synthesis makes increasing use of MIDI for sequencing
and communication between devices.

52
Methods of Synthesis

Jeff Pressing in “Synthesizer Performance and Real-Time Techniques”


gives this list of approaches to sound synthesis.
• additive synthesis - combining tones, typically harmonics of varying
amplitudes
• subtractive synthesis - filtering of complex sounds to shape harmonic
spectrum, typically starting with geometric waves.
• frequency modulation synthesis - modulating a carrier wave with one
or more operators
• sampling - using recorded sounds as sound sources subject to modi-
fication
• composite synthesis - using artificial and sampled sounds to establish
resultant “new” sound
• phase distortion - altering speed of waveforms stored in wavetables
during playback
• waveshaping - intentional distortion of a signal to produce a modified result
• resynthesis - modification of digitally sampled sounds before playback
• granular synthesis - combining of several small sound segments into
a new sound

53
• linear predictive coding - technique for speech synthesis
• direct digital synthesis - computer modification of generated wave-
forms
• wave sequencing - linear combinations of severtal small segments to
create a new sound
• vector synthesis - technique for fading between any number of differ-
ent sound sources
• physical modeling - mathematical equations of acoustic characteristics
of sound

54
MIDI for Music

Musical Instrument Digital Interface (MIDI) is a data transfer protocol


which is widely used with music synthesizers. MIDI can be used as a con-
troller between modules in an integrated music system. It uses a serial data
connection with five leads. It uses two basic message types - channel and
system. Channel messages can be sent from machine to machine over any
one of 16 channels to control an instrument’s voice parameters or to control
the way the instrument responds to voice messages. System messages can be
directed to all devices in the system (called “common” messages) or can be
directed to a specific machine (exclusive). Within the MIDI protocol, a basic
set of standards has been developed called the General MIDI specification,
or just GM. It attempts to standardize common practices within MIDI and
make it more accessible to the general user. GM is particularly appropri-
ate for a personal computer-based MIDI system using a sound card in the
computer. Part of the GM standard requires support for a basic set of 128
instruments, a minimum of 24-voice polyphony, and polytimbrality to at
least 16 sounds deep.
Using a MIDI sequencer to control one or more instruments has some
significant practical benefits. A MIDI sequence which plays several minutes
of music can be stored in a few kilobytes of memory on a computer, whereas

55
the storage of a minute’s worth of digitally precise and clear CD quality
music directly on a computer disc might take 10 MB of memory. The MIDI
file is just a digital representation of the sequence of notes with information
about pitch, duration, voice, etc., and that takes much less memory than the
digitally recorded image of the complex sound.
Other practical benefits include the ability to transpose music without
changing its duration, to change its tempo without changing its pitch, or
change the synthetic instruments used to perform the piece of music. Draw-
backs include the inability to easily include a recorded voice part or played
instrument along with the MIDI sequenced sound, but on the other hand,
the music can be easily synchronized with multimedia events in a produc-
tion.

56
Phonograph Cartridge
The movement of a coil of
wire in a magnetic field gen-
erates a voltage according to
Faraday’s law. The tracking of a
groove on a vinyl record by the
needle on a phonograph car-
tridge may cause a tiny coil to
move in a magnetic field, gener-
ating an electrical image of the

The movement of a coil of signal recorded in the groove.

wire in a magnetic field gen-


erates a voltage according to
Faraday’s law. The tracking of a
groove on a vinyl record by the
needle on a phonograph car-
tridge may cause a tiny coil to
move in a magnetic field, gener-
ating an electrical image of the
signal recorded in the groove.

57
Chapter 3
Audio compact disk
Compact Disc Audio

Analog sound data is digitized by sampling at 44.1 kHz and coding as


binary numbers in the pits on the compact disc. As the focused laser beam
sweeps over the pits, it reproduces the binary numbers in the detection cir-
cuitry. The same function as the “pits” can be accomplished by magnetoop-
tical recording. The digital signal is then reconverted to analog form by a
D/A converter.
The tracks on a compact disc are nominally spaced by 1.6 micrometers,
close enough that they are able to separate reflected light into it’s compo-
nent colors like a diffraction grating.

62
63
Laser for Compact Discs

The detection of the binary data stored in the form of pits on the com-
pact disc is done with the use of a semiconductor laser. The laser is focused
to a diameter of about 0.8 mm at the bottom of the disc, but is further
focused to about 1.7 micrometers as it passes through the clear plastic sub-
strate to strike the reflective layer.
The Philips CQL10 laser has a wavelength of 790 nm in air. The depth
of the pits is about a quarter of the wavelength of this laser in the substrate
material.

64
Polarizing Prism

A polarizing prism is made up of two prisms of a birefringent material


joined along a diagonal. The angle of cut is such that the plane of polariza-
tion parallel to the surface undergoes total internal reflection whereas the
plane perpendicular to the surface passes through. Because of the action
of the quarter-wave plate, the beam returning from the disc will be polar-
ized parallel to the surface and will be reflected 90°, toward the photodiode
detector.

65
Photodiode Detection

Laser light from the reflective layer of the disc returns through the quar-
ter-wave plate. This causes it to reflect in the beam-splitter so that it reaches
the photodiode for detection. However, if the beam strikes one of the pits,
which are about a quarter- wavelength in depth, the light is out of phase
with the light reflecting from the unaltered plane around it and tends to
cancel it. This produces enough change in light level to be detected by the
photodiode, and to be coded as the 0’s and 1’s of binary data.
Laser Beam Positioning
In order to be reliably decoded, the laser beam must be focused within
about 0.5 micrometers of the reflective surface, but the location of the bot-
tom of the disc may be uncertain by about 0.5 mm during rotation. To keep
the beam focused, a positioning coil drives the focusing lens up or down
in response to an error voltage from the detector. One scheme uses a cylin-
drical lens arrangement to focus light on the detector. When the beam is
properly focused, it projects a round beam and a zero error voltage results.

66
Digital Sampling

For the purpose of storing audio information in digital form, like a com-
pact disc, the normal continuous wave audio signal (analog) must be con-
verted to digital form (analog-to-digital) conversion. Below is an example of
a D/A conversion using digits 0-9, but practical schemes store the numbers
in binary form. The number of bits in the binary sampler determines the
accuracy with which the analog signal can be represented in digital form.

From this crude picture of digitizing in steps, perhaps you can appreciate
the industry standard of 16 bit sampling in which the voltage is sampled into
65,536 steps. In addition to the number of steps, the rate of sampling also af-
fects the fidelity of representation of the analog waveform. The standard sam-
pling rate is 44.1 kHz, so the assignment of one of 65, 536 values to the signal is
done 44,100 times per second. If this recorded data is read from the CD in real
time, you are processing 1.4 million bits of information per second (1.4 Mbps).

67
Implication of Number of Bits

The number of bits used in a digital sampler determines the resolution


of the “image” of the signal being represented. A one-bit sampler would be
like a light switch, having two possible values, on and off. Samplers with
multiple bits are representing a signal in terms of a binary number, and the
number of bits determines the number of values which can be expressed,
just as a 3 digit number in the usual 10-based number system can take more
values than a 2 digit number. The number of bits determines the possible
dynamic range.

68
Bits and Dynamic Range

If you are doing a linear stair-step digitization of an analog waveform,


the number of bits used in the sampling determines the maximum dynamic
range you can faithfully represent with the sampling.
For example, an 8-bit sample
can give you 256 possible values, so
if the softest sound you sample is
represented by a 1, then the loudest
sound you can represent is a 256.
The maximum range of amplitudes
is then 256, and since the power
is proportional to the amplitude
squared, the power range is 256 squared or 1 to 65,536 in power. This cor-
responds to a dynamic range of 48 decibels whereas the dynamic range of
an orchestra is about 40 to 100 dB, or 60 dB. By similar calculation, a 12 bit
digitization can give you a dynamic range of 72 dB and 16 bits can give you
96 dB of dynamic range.

69
Digital Data on a Compact Disc

Binary data (0’s and 1’s) are encoded on a compact disc in the form of
pits in the plastic substrate which are then coated with an aluminum film to
make them reflective.

The data is detected by a laser beam which tracks the concentric circular
lines of pits. The pits are 0.8 to 3 micrometers long and the rows are sepa-
rated by 1.6 micrometers.

70
Compact Disc Drive Details

In a compact disc player, a laser beam must track a spiral row of pits
which are 0.5 micrometers wide with track spacing 1.6 micrometers. Track-
ing is aided by a three-beam laser arrangement. In addition to staying on
the track, which is much narrower than the 100 micrometer groove sepa-
ration on a vinyl record, the rotation speed must be adjusted as the beam
tracks inward or outward. A linear speed of 1.25 m/s is maintained by in-
creasing the rotation speed from 3.5 to 8 revolutions per second as the beam
tracks inward toward the center of the disc.

71
Laser Tracking on the CD

The laser beam in the com-


pact disc player must precisely
track a row of pits which encode
the binary data on the disc. In the
“three-beam” system, a grating is
used to produce the first order
diffraction maximum to each
side of the main beam. Those dif-
fracted beams overlap the track,
and the reflected light from the
two side beams should be equal,
on the average, if the main beam

is centered on the track. If they are unequal, then their difference can be
used to generate an error voltage to correct the tracking. The illustration of
the side beam positions is not to scale; they deviate about 20 micrometers
from the main beam.

72
Scaled Views of a Compact Disc

Data on a compact disc is stored in the form of pits in the plastic sub-
strate.
A reflective layer of aluminium is applied to reflect the laser beam. A protec-
tive coating is then applied to the top. The laser system reads the data from below.

73
Detection of CD Pits

The tracking
laser beam sees
the pits as raised
areas which are
about a quarter-
wavelength high
for the laser
light.

The reflected light from the pit is then 180° out of phase with the reflec-
tion from the flat area, so the reflected light intensity drops as the beam
moves over a pit. The threshold of the photodiode detector can be adjusted
to switch on this light level change.

74
CD Response to Defects

The signal from a compact disc is relatively insensitive to the presence


of small defects such as dust or fine scratches on the bottom surface of the
CD because the laser beam is fairly large at that point, about 0.8 mm. As
illustrated below, typical dust particles are much smaller than that. As the
laser is further focused down to about 1.7 micrometers at the depth of the
pits, any shadow from the small defects is blurred and indistinct and does
not cause a read error. Larger defects are handled by error-correcting codes
in the handling of the digital data.

75
Error-Correction of CD Signals

The data on a compact disc is encoded in such a way that some well- de-
veloped error-correction schemes can be used. A sophisticated error- cor-
rection code known as CIRC (cross interleave Reed-Solomon code) is used
to deal with both burst errors from dirt and scratches and random errors
from inaccurate cutting of the disc. The data on the disc are formatted in
frames which contain 408 bits of audio data and another 180 bits of data
which include parity and sync bits and a subcode. A given frame can con-
tain information from other frames and the correlation between frames can
be used to minimize errors. Errors on the disc could lead to some output
frequencies above 22kHz (half the sampling frequency of 44.1 kHz) which
could cause serious problems by “aliasing” down to audible frequencies. A
technique called oversampling is used to reduce such noise. Using a digital
filter to sample four times and average provides a 6-decibel improvement in
signal-to-noise ratio. For more details, see the references.
Data Encoding on Compact Discs
When the laser in a compact disc player sweeps over the track of pits
which represents the data, a transition from a flat area to a pit area or vice
versa is interpreted as a binary 1, and the absence of a transition in a time
interval called a clock cycle is interpreted as a binary 0. This kind of detec-

76
tion is called an NRZI code. The particular NRZI code used with compact
discs is EFM (eight-to-fourteen modulation) in which eight bits of data are
represented by fourteen channel bits. In addition to the actual digital sound
data, parity and sync bits and a subcode are also recorded on the disc in
“frames” . In a given frame, 408 bits of audio data are recorded with another
180 bits of data which permit a sophisticated error-correction code to be
used. A given frame can contain information from other frames and the
correlation between frames can be used to minimize errors. In addition to
detection, a significant amount of computation must be done to decode the
signal and prepare it for conversion back to analog form with a DAC.

77
Detection of Compact Disc Data

The pits which encode the digital data on a compact disc are tracked by
a laser. The reflected light from the pits is out of phase with that from the
surrounding area, so the reflected light intensity drops when the laser moves
over a pit area. The nature of a photodiode is such that it can be used as the
sensing element in a light-activated switch. It conducts an electric current
which is proportional to the light falling on it. The photodiode and switch
can be adjusted so that a transition to a pit area will switch it off, and a
transition from a pit area will switch it on. Either transition is interpreted
as a binary 1, while the absence of a transition in a given clock cycle is in-
terpreted as a binary zero. The data on the disc is encoded in a sophisticated
way, so that decoding is necessary before sending the digital signal repre-
senting the sound to a digital-to-analog converter (DAC) for reconversion
to analog form.

78
79
Cylindrical Lens for Positioning

A clever use of a cylindrical lens generates a correction signal to position


the main focusing lens for the detector in a compact disc player. The combi-
nation of a symmetric lens and a cylindrical lens produces a circular beam
at only one distance past the cylindrical lens. A segmented photodiode ar-
rangement can detect whether the beam is circular and generate an error
voltage to reposition the main lens so that it is. The error voltage drives a
coil which can rapidly reposition the lens in response to changes in distance
to the CD as it rotates.

80
CD Storage Capacity

A compact disc can store more than 6 billion bits of binary data. This is
equivalent to 782 megabytes, and at 2000 characters per page this is equiva-
lent to about 275,000 pages of text (Rossing). Because the analog-to-digital
conversion for making CD’s involves 16-bit sampling of sound waveforms
at 44.1 kHz, the amount of data involved in the recording of high-fidelity
sound is very large. The 12 cm diameter compact disc can hold 74 minutes
of digital audio which has a frequencies over the full audible range of 20-
20,000 Hz. The signal-to-noise ratio and the dynamic range can exceed 90
decibels.

81
Broadcast Signals

Radio communication is typically in the form of AM radio or FM Radio


transmissions. The broadcast of a single signal, such as a monophonic audio
signal, can be done by straightforward amplitude modulation or frequency
modulation. More complex transmissions utilize sidebands arising from
the sum and difference frequencies which are produced by superposition of
some signal upon the carrier wave. For example, in FM stereo transmission,
the sum of left and right channels (L+R) is used to frequency modulate the
carrier and a separate subcarrier at 38 kHz is also superimposed on the car-
rier. That subcarrier is then modulated with a (L-R) or difference signal so
that the transmitted signal can be separated into left and right channels for
stereo playback. In television transmission, three signals must be sent on the
carrier: the audio, picture intensity, and picture chrominance. This process
makes use of two subcarriers. Other transmissions such as satellite TV and
long distance telephone transmission make use of multiple subcarriers for
the broadcast of multiple signals simultaneously.

82
83
AM Radio

AM radio uses the electrical image of a sound source to modulate the


amplitude of a carrier wave. At the receiver end in the detection process,
that image is stripped back off the carrier and turned back into sound by a
loudspeaker.
When information is broadcast from an AM radio station, the electrical
image of the sound (taken from a microphone or other program source) is
used to modulate the amplitude of the carrier wave transmitted from the
broadcast antenna of the radio station. This is in contrast to FM radio where
the signal is used to modulate the frequency of the carrier.
The AM band of the Electromagnetic spectrum is between 535 KHz and
1605 kHz and the carrier waves are separated by 10 kHz.
A radio receiver can be tuned to receive any one of a number of radio
carrier frequencies in the area of the receiver. This is made practical by
transferring the signal from the carrier onto an intermediate frequency in
the radio by a process called heterodyning. In a heterodyne receiver, most
of the electronics is kept tuned to the intermediate frequency so that only
a small portion of the receiver circuit must be retuned when changing sta-
tions.

84
85
FM Radio

FM radio uses the electrical image of a sound source to modulate the


frequency of a carrier wave. At the receiver end in the detection process,
that image is stripped back off the carrier and turned back into sound by a
loudspeaker.
When information is broadcast from an FM radio station, the electrical
image of the sound (taken from a microphone or other program source)

86
is used to modulate the frequency of the carrier wave transmitted from
the broadcast antenna of the radio station. This is in contrast to AM radio
where the signal is used to modulate the amplitude of the carrier.
The FM band of the electromagnetic spectrum is between 88 MHz and
108 MHz and the carrier waves for individual stations are separated by 200
kHz for a maximum of 100 stations. These FM stations have a 75 kHz maxi-
mum deviation from the center frequency, which leaves 25 kHz upper and
lower “gaurd bands” to minimize interaction with the adjacent frequency
band. This separation of the stations is much wider than that for AM sta-
tions, allowing the broadcast of a wider frequency band for higher fidelity
music broadcast. It also permits the use of sub-carriers which make possible
the broadcast of FM Stereo signals.

87
Frequency Modulation

The change in frequency, which is greatly exaggerated here, is propor-


tional to the amplitude of the signal. An FM radio carrier around 100 MHz
is limited to modulation of +/- 0.1 MHz. Normal FM stereo broadcast is
within +/- .053 MHz.

88
Digital Surround Sound

Digital surround refers to surround sound systems which employ dis-


crete digital recordings of five channels of sound information. Digital sur-
round sound has been introduced into movie theaters in a form called
Dolby Stereo Digital. At the heart of Dolby Stereo Digital is and encoding
scheme called AC-3. The AC-3 based systems are now often referred to as
just “Dolby digital” in the consumer market. It’s hard to tell which is chang-
ing faster: the technology or the terminology.
Dolby Stereo Digital uses a digital data stream running at 320 kilobits
per second. The HDTV and laserdisc version of Dolby Surround AC-3 Digi-
tal runs at 384 kilobits per second and dynamically allocates the bits to
the channel with the most demanding signal. Use is made of perceptual
encoding to decide which parts of the audio signal would not be heard and
therefore can be eliminated. The system provides a slight delay in the center
channel sound to achieve a more realistic experience of the sounds arriving
at the listeners location from the other speakers.
Riggs, Michael, Digital Surround Comes Home, Stereo Review, May
1995 p 62
Ranada, David, “Inside Dolby Digital”, Stereo Review 61, Oct 96 p81-84.

89
Perceptual Encoding for Digital Sound

Perceptual encoding refers to systems which dynamically determine the


number of bits of data given to a given channel of audio information based
on judgements of its importance to the sound perceived by the listener.
Sounds which are below the audibility threshold for the human ear should
not waste bits which could be devoted to a higher fidelity reproduction of
an important sound. Also, certain sounds are masked by others, and if it is
judged that a certain sound would be masked anyway, why not give those
bits to another sound which would be heard?
Perceptual encoders divide the sound into frequency bands and deter-
mine which bands contain essential audio information, based on rules for
audibility and masking. It can also deal with the signal just before and just
after a given point in the recording since the temporal environment also af-
fects human perception of sound.

90
MP3 Digital Sound

MP3 stands for MPEG 1 Layer 3. MPEG is a compression type for digi-
tal data. MP3 is a variation of this format that allows sound files to be
compressed by 90 percent without major degradation of the quality of the
sound. The compressed audio file takes up so much less storage space than
on a regular compact disc or tape that it has become very convenient for
transfer on the Internet.
The real possibility for sound compression without audible loss comes
from the fact that the sampling for CDs contains far more than the nec-
essary data. Sixteen-bit digital sampling at 44.1 kHz gives you a stagger-
ing amount of information. From the audio CD you get about 1.4 million
bits per second of information, much more information than your ears can
process. To create the MP3 signal, the information from the CD format is
divided into frequency subbands. Then the signal in each subband is exam-
ined in the process of encoding to decide how many bits to allocate to it. The
process employs a “psychoacoustic model” to decide which subbands will be
recorded most accurately and which will be discriminated against. The idea
is that only that which can realistically be heard by the ear is kept.
The favorite visual metaphor is the “polar bear in the snow storm”.
Against a dark mountain on a clear day, you would have to paint the polar

91
bear with great definition. But if the polar bear is in a snowstorm, you don’t
have to provide as much detail, because you are not going to see much detail
anyway. By analogy, if a sound in a particular subband is going to be masked
out by other subbands so that you won’t hear it anyway, you might as well
save the bits you were going to use to record it. The “psychoacoustic model”
makes judgements about which sounds were going to be masked out.
Some model is applied in the encoding of the high-resolution digital
sound image to MP3, and that model is inevitably going to take out some
audible information. You can improve the model by encoding at a higher bit
rate, because you are putting in more information. Typical current bit rates
are 128, 160, 192, 256 and 320 kbps. Tests show that the accuracy increases
significantly up to 256 kbps with some current decoders, so 256 kbps is
perhaps a good comparison standard. At 256,000 bits of information per
second, you have reduced the 1.4 Mbps to about 18% - compression by more
than five to one. Of course you can get ten to one at 128 kbps, but you can’t
expect to get it without noticeable loss of sound quality.

92
Masking

You know I can’t hear you when the water is running!


This statement carries the essentials of the conventional wisdom about
sound masking. Low-frequency, broad banded sounds (like water running)
will mask higher frequency sounds which are softer at the listener’s ear (a
conversational tone from across the room). For a single frequency masking
tone, masking curves can be determined experimentally. Also, from the idea
of the just noticeable difference in sound intensity, one can approximately
calculate the amount of a added second sound that would exceed the jnd
and thus be audible.
Broadband white noise tends to mask all frequencies, and is approxi-
mately linear in that masking. By linear you mean that if you raise the white
noise by 10 dB, you have to raise everything else 10 dB to hear it.

93
Masking Curves

Shown are the masking effects of 1200 Hz tones of various intensities.


Note that it is effective in masking sounds above it in frequency, but not
below. The dips at 1200 and 2400 come from the effects of beats, which make
the masked tone easier to detect.

Backus credits this data to Harvey Fletcher in “Speech and Hearing in


Communication”, p155”
Although masking is a complex phenomena, the experience of mask-
ing of higher frequency sounds by strong low frequency sounds is common

94
experience and of considerable significance to orchestration. It is easy to
create circumstances where a strong bass brass section can mask the softer,
higher frequency sounds of the woodwind section.

95
Audibility Threshold, Second Sound

If the general Just Noticeable Difference for sound intensity is taken to


be one decibel, then the addition of a second sound would have to raise the
total intensity by 1 dB to be heard. This is roughly a 25% increase, so the
second sound would have to have about 1/4 the intensity, or about 6 dB less
than the existing, masking sound. While this approach is not an adequate
treatment of the complex subject of masking, it does permit a calculation of
the amount of a second sound required to meet any criterion which is stated
in terms of the decibel increase in the sound field.
If a masking sound of level dB is present,
and the the total sound field must be increased by dB
for the change to be audible, then a second sound would be masked if its
level were below dB
The real test of any model is its comparison with experiment, and if you
compare this calculation with the experimental curves for masking by pure
tones above, you will find that it does not agree well. One of the things this
suggests is that using a 1 dB increase in overall sound field intensity as the
just noticeable difference is an oversimplification. In fact, it tells you that

96
you should be quite wary of using the 1 dB JND at all except for the assess-
ment of how much you should increase the dB level of the same sound to
produce an audible difference.

97
Calculation Details, Masking Threshold

If the amount of sound B which must be added to pre-existing sound


A must increase the overall decibel level by an amount a in decibels, that
requirement can be stated as

From the properties of logarithms, both sides can be expressed as pow-


ers of 10:

Solving for the ratio of the two intensities


This ratio in decibels can be subtracted from the level IA in dB to get the
threshold for IB which could just be heard.

98
Chapter 4
Electrical principal
Microphones

Microphones are transducers which detect sound signals and produce


an electrical image of the sound, i.e., they produce a voltage or a current
which is proportional to the sound signal. The most common microphones
for musical use are dynamic, ribbon, or condenser microphones. Besides the
variety of basic mechanisms, microphones can be designed with different
directional patterns and different impedances.

102
Dynamic Microphones

Advantages:
• Relatively cheap and rugged.
• Can be easily miniaturized.
Disadvantages:
The uniformity of response to different frequencies does not match that
of the ribbon or condenser microphones.
Principle: sound moves the cone and the attached coil of wire moves in
the field of a magnet. The generator effect produces a voltage which “im-
ages” the sound pressure variation - characterized as a pressure microphone.

103
The geometry of a dynamic microphone is like that of a tiny loudspeak-
er, and that is not just a coincidence. A dynamic microphone is essentially
the inverse of a dynamic loudspeaker. In a dynamic microphone, the sound
pressure variations move the cone, which moves the attached coil of wire
in a magnetic field, which generates a voltage. In the loudspeaker, the in-
verse happens: the electric current associated with the electrical image of the
sound is driven through the coil in the magnetic field, generating a force on
that coil. The coil moves in response to the audio signal, moving the cone
and producing sound in the air.
A small loudspeaker can be used as a dynamic microphone, and this fact
is exploited in the construction of small intercom systems. Depending upon
the position of the Talk-Listen switch, the device on either end of the inter-
com system can be used as a microphone or a loudspeaker. Of course, this
is not a high fidelity process, and for commercial dynamic microphones, the
device is optimized for use as a microphone, not a loudspeaker.

104
Ribbon Microphones

Principle: the air movement associated with the sound moves the metal-
lic ribbon in the magnetic field, generating an imaging voltage between the
ends of the ribbon which is proportional to the velocity of the ribbon - char-
acterized as a “velocity” microphone.

105
Advantages:
Adds “warmth” to the tone by accenting lows when close-miked.
Can be used to discriminate against distant low frequency noise in its
most common gradient form.

Disadvantages:
Accenting lows sometimes produces “boomy” bass.
Very susceptible to wind noise. Not suitable for outside use unless
very well shielded.

106
Condenser Microphones

Principle: sound pressure changes the spacing between a thin metallic


membrane and the stationary back plate. The plates are charged to a total
charge
Advantages:
• Best overall frequency response makes this the microphone of choice
for many recording applications.

107
Disadvantages:
• Expensive
• May pop and crack when close miked
Requires a battery or external power supply to bias the plates.

A change in plate spacing will cause a change in charge Q and force a


current through resistance R. This current “images” the sound pressure,
making this a “pressure” microphone.
where C is the capacitance, V the voltage of the biasing battery, A the
area of each plate and d the separation of the plates. •

108
Crystal Microphone

Crystals which demonstrate the piezoelectric effect produce volt-


ages when they are deformed. The crystal microphone uses a thin strip of
piezoelectric material attached to a diaphragm. The two sides of the crystal
acquire opposite charges when the crystal is deflected by the diaphragm.
The charges are proportional to the amount of deformation and disappear
when the stress on the crystal disappears. Early crystal microphones used
Rochelle salt because of its high output, but it was sensitive to moisture and
somewhat fragile. Later microphones used ceramic materials such as bari-
um titanate and lead zirconate. The electric output of crystal microphones
is comparatively large, but the frequency response is not comparable to a
good dynamic microphone, so they are not serious contenders for the music
market.

109
Parabolic Microphone

For recording distant bird or


animal sounds, or for just plain
eavesdropping, it is hard to beat the
directionality and sensitivity of the
parabolic microphone.

110
Amplifiers

The task of an audio amplifier is to take a small signal and make it bigger
without making any other changes in it. This is a demanding task, because

a musical sound usually contains several frequencies, all of which must


be amplified by the same factor to avoid changing the waveform and hence
the quality of the sound. An amplifier which multiplies the amplitudes of all
frequencies by the same factor is said to be linear. Departures from linearity
lead to various types of distortions.
The operational details of amplifiers are buried in the field of electron-
ics, but for audio purposes it is usually safe to say that current commercial
audio amplifiers are so good that a normally operating amplifier is seldom
the limitation on the fidelity of a sound reproduction system. One must be

111
sure that the amplifier can provide enough power to drive the existing loud-
speakers, but otherwise amplifiers are typically one of the most trouble-free
elements of a sound system.

112
Impedance Matching

In the early days of high fidelity music systems, it was crucial to pay at-
tention to the impedance matching of devices since loudspeakers were driv-
en by output transformers and the input power of microphones to preamps
was something that had to be optimized. The integrated solid state circuits
of modern amplifiers have largely removed that problem, so this section
just seeks to establish some perspective about when impedance matching is
a valid concern.
As a general rule, the maximum power transfer from an active device
like an amplifier or antenna driver to an external device occurs when the
impedance of the external device matches that of the source. That optimum
power is 50% of the total power when the impedance of the amplifier is
matched to that of the speaker. Improper impedance matching can lead to
excessive power use, distortion, and noise problems. The most serious prob-
lems occur when the impedance of the load is too low, requiring too much
power from the active device to drive the load at acceptable levels. On the
other hand, the prime consideration for an audio reproduction circuit is
high fidelity reproduction of the signal, and that does not require optimum
power transfer.
In modern electronics, the integrated circuits of an amplifier have at

113
their disposal hundreds to thousands of active transistor elements which
can with appropriate creative use of feedback make the performance of the
amplifier almost independent of the impedances of the input and output
devices within a reasonable range.
On the input side, the amplifier can be made to have almost arbitrarily
high input impedance, so in practice a microphone sees an impedance consid-
erably higher than its own impedance. Although that does not optimize power
transfer from the microphone, that is no longer a big issue since the amplifier
can take the input voltage and convert it to a larger voltage - the term currently
used is “bridging” to a larger image of the input voltage pattern.
On the output side, a loudspeaker may still have a nominal impedance
of something like 8 ohms, which formerly would have required having an
amplifier output stage carefully matched to 8 ohms. But now with the active
output circuitry of audio amplifiers, the effective output impedance may be
very low. The active circuitry controls the output voltage to the speaker so
that the appropriate power is delivered.

114
Matching Amplifier to Loudspeaker

The maximum power transfer from an active device like an amplifier to


an external device like a speaker occurs when the impedance of the external
device matches that of the source. That optimum power is 50% of the total
power when the impedance of the amplifier is matched to that of the speaker.
But modern audio amplifiers are active control devices, and the imped-
ance matching of the amplifier to the loudspeaker is no longer considered
best practice. Modern solid state amplifiers are sometimes referred to as
“bridging” devices which take an input voltage from an audio source and
form an amplified image of that voltage at the output. The output imped-
ance is low, and the output voltage and power are controlled dynamically.
The implications of the simplified model for resistive amplifier outputs
and speakers may nevertheless be instructive as a reference. For example, as-
sume that the maximum distortion-free voltage from the amplifier is 40 volts:

115
To emphasize the oversimplification involved in the above model, it
should be noted that the loudspeaker is not a simple resistor - it contains a
coil or coils with significant inductance, and is typically composed of two
or three speakers with a crossover network that has capacitance and in-
ductance. So the impedance of the loudspeaker will inevitably vary with
frequency. The only present day amplifiers that would have a characteristic
output impedance like that shown would be those designed to operate with
“valve” or “vacuum tube” amplifiers.
Note that it is safer in terms of total power to go to higher impedance
speakers (series speakers), but more typical practice is to put speakers in paral-
lel, lowering the impedance. Note in the table above that lowering the imped-
ance below the output impedance of the amplifier not only reduces the output
power but increases the internally dissipated power in the amplifier.

116
This diagram shows the relationships used to obtain the power values
in the table above. Note that it assumes a resistive nature of both the loud-
speaker impedance and the internal impedance, neither of which is strictly
true.

117
Amplifier Distortion

The amplitudes of all frequencies within an amplifier’s operating range


must be amplified by the same factor to avoid distortion. An amplifier
which satisfies this requirement is said to be perfectly linear. If the peaks of
the waveform are clipped, this gives rise to what is called harmonic distor-
tion. Another type of distortion is intermodulation distortion, which occurs
when different frequencies in the signal mix to produce sum and differ-
ence frequencies which didn’t exist in the signal. Transient distortion occurs
when amplifier components cannot handle the rate of change of the signal,
for example in rapid percussive attacks. There is also transient intermodula-
tion distortion (TIM) to which modern integrated circuits are susceptible.
Such circuits depend upon feedback for their linearity, but time delays in
the feedback can cause intermodulation distortion on fast transients in the
signal.

118
Harmonic Distortion

A common type of amplifier distortion is called harmonic distortion. It


can arise if any component in the amplifer clips the peaks of the waveform.
A common specification for high fidelity amplifiers is the total harmonic
distortion. This distortion may be less than 1%, or even less than 0.5% from
20-20,000 Hz for high quality amplifiers.

In the diagram, the input is a single frequency (pure sine wave), but
the output waveform is clipped by the amplifier. The result is that harmonic
frequencies not present in the original signal are produced at the output
(harmonic distortion). This harmonic distortion contains only odd harmon-
ics if the clipping is symmetrical. For example, a geometrical square wave

119
has only odd harmonics, and as a signal is clipped, it approaches a square
wave rather than a sine wave.
The frequency spectrum at right is that measured at the output of a
particular amplifier driven above its rated power. The spectrum has a larger
amount of odd harmonic than even harmonic output, but the fact that even
harmonics are present suggests that the distortion was not symmetrical with
respect to the waveform.
An amplifier can be said to be linear if the output voltage is strictly
proportional to the input signal. Any nonlinearity, such as that arising from
the semiconductor devices themselves, will give rise to harmonic distortion.
Such defects in the performance of the devices can be minimized by using
negative feedback in the circuit so long as the output is not overdriven to
the point of clipping.
Plots of frequency spectra such as those illustrated here can be impor-
tant diagnostic and research tools. Converting a signal from a plot as a func-
tion of time to a plot as a function of frequency is called Fourier analysis,
and a common display is the Fast Fourier Transform or FFT of the signal.

120
Intermodulation Distortion

Non-linearity in amplifier components causes mixing of frequency com-


ponents to form components at sum and difference frequencies. This in-
termodulation distortion is particularly troublesome in the reproduction
of music because it generates frequencies which were not present in the
original music and are thus very noticeable. Harmonic distortion may also
be serious, but at least the musical sound probably already had these har-
monics present as part of the harmonic content of the sound, so it can be
tolerated to a greater degree than intermodulation distortion.

121
Dynamic Loudspeaker Principle

A current-carrying wire in a magneti


c field experiences a magnetic force
perpendicular to the wire.

An audio signal source such as a microphone or recording produces


an electrical “image” of the sound. That is, it produces an electrical signal
that has the same frequency and harmonic content, and a size that reflects
the relative intensity of the sound as it changes. The job of the amplifier is
to take that electrical image and make it larger -- large enough in power to

122
drive the coils of a loudspeaker. Having a “high fidelity” amplifier means
that you make it larger without changing any of its properties. Any changes
would be perceived as distortions of the sound since the human ear is amaz-
ingly sensitive to such changes. Once the amplifier has made the electrical
image large enough, it applies it to the voice coils of the loudspeaker, mak-
ing them vibrate with a pattern that follows the variations of the original
signal. The voice coil is attached to and drives the cone of the loudspeaker,
which in turn drives the air. This action on the air produces sound that
more-or-less reproduces the sound pressure variations of the original signal.

123
Loudspeaker Basics

The loudspeakers are almost always the


limiting element on the fidelity of a repro-
duced sound in either home or theater. The
other stages in sound reproduction are mostly
electronic, and the electronic components are
highly developed. The loudspeaker involves
electromechanical processes where the ampli-
fied audio signal must move a cone or other
mechanical device to produce sound like the
original sound wave. This process involves
many difficulties, and usually is the most im- Click image for more details.
perfect of the steps in sound reproduction.
Choose your speakers carefully. Some basic ideas about speaker enclosures
might help with perspective.
Once you have chosen a good loudspeaker from a reputable manufac-
turer and paid a good price for it, you might presume that you would get
good sound reproduction from it. But you won’t --- not without a good
enclosure. The enclosure is an essential part of sound production because of
the following problems with a direct radiating loudspeaker: Z

124
The sound from the back of the The free cone speaker is very inef-
speaker cone will tend to cancel ficient at producing sound wave-
the sound from the front, espe- lengths longer than the diameter
cially for low frequencies. of the speaker.

Speakers have a free-cone reso- More power is needed in the bass


nant frequency which distorts the range, making multiple drivers
sound by responding too strongly with a crossover a practical neces-
to frequencies near resonance. sity for good sound.

125
Loudspeaker Details

An enormous amount of engineering work has gone into the design of


today’s dynamic loudspeaker. A light voice coil is mounted so that it can
move freely inside the magnetic field of a strong permanent magnet. The
speaker cone is attached to the voice coil and attached with a flexible mount-
ing to the outer ring of the speaker support. Because there is a definite
“home” or equilibrium position for the speaker cone and there is elasticity
of the mounting structure, there is inevitably a free cone resonant frequency
like that of a mass on a spring. The frequency can be determined by adjust-
ing the mass and stiffness of the cone and voice coil, and it can be damped
and broadened by the nature of the construction, but that natural mechani-
cal frequency of vibration is always there and enhances the frequencies in
the frequency range near resonance. Part of the role of a good enclosure is
to minimize the impact of this resonant frequency.

126
127
Types of Enclosures

The production of a good high-fidelity loudspeaker requires that the


speakers be enclosed because of a number of basic properties of loudspeak-
ers. Just putting a single dynamic loudspeaker in a closed box will improve
its sound quality dramatically. Modern loudspeaker enclosures typically in-
volve multiple loudspeakers with a crossover network to provide a more
nearly uniform frequency response across the audio frequency range. Other
techniques such as those used in bass reflex enclosures may be used to ex-
tend the useful bass range of the loudspeakers.

128
The nature of the enclosure can affect the efficiency and the directional-
ity of a loudspeaker. The use of horn loudspeakers can provide higher effi-
ciency and more directionality, but in extremes can reduce the fidelity of the
sound. Line array enclosures can provide some directionality.
The term “infinite baffle” is
often encountered in discussions of
loudspeaker installations. It visual-
izes a loudspeaker mounted in an
infinite plane with unlimited vol-
ume behind it, but in practical use
may refer to a loudspeaker mounted
in the surface of a flat wall with con-
siderable volume of air behind it.
Because of the elastic properties of
the loudpeaker suspension, it will still exhibit its natural free-cone reso-
nance, but will be free of the diffraction effects observed with a small box
speaker, and essentially free of the effects of the compression of the air be-
hind the loudspeaker cone.

129
Use of Multiple Drivers in Loudspeakers

Even with a good enclosure, a single loudspeaker cannot be expected


to deliver optimally balanced sound over the full audible sound spectrum.
For the production of high frequencies, the driving element should be small
and light to be able to respond rapidly to the applied signal. Such high
frequency speakers are called “tweeters”. On the other hand, a bass speaker
should be large to efficiently impedance match to the air. Such speakers,
called “woofers”, must also be supplied with more power since the signal
must drive a larger mass. Another factor is that the ear’s response curves
discriminate against bass, so that more acoustic power must be supplied in
the bass range. It is usually desirable to have a third, mid-range, speaker to
achieve a smooth frequency response. The appropriate frequency signals are
routed to the speakers by a crossover network.

130
Horn Loudspeakers

Horn type loudspeakers use a large diaphragm which supplies periodic


pressure to a small entry port of a long horn. More compact versions use a
folded horn geometry. The large diagphragm system is called a “compres-
sion driver” since its large air displacement which feeds into a small port
causes a larger pressure variation than ordinary loudspeakers. The long ta-
pered horn increases the sound production efficiency by perhaps a factor of
ten compared to an ordinary open cone-type loudspeaker.

131
Line Array or Column Loudspeakers

One way to achieve a degree of directional-


ity with bass loudspeakers and to achieve a modest
increase in potential acoustic gain is to use several
identical bass speakers in a “line array” or column
geometry.
It may be surprising that the effect of this geom-
etry is to spread the sound in the horizontal plane and
narrow its spread in the vertical plane. This is the geo-
metric effect of sound diffraction, and is analogous to
the spreading of light through a narrow slit in the di-
rection perpendicular to the narrow dimension.
The directional patterns below show the spread-
ing of the sound in the horizontal plane on a polar

scale indicating the diminishing of the sound in decibels compared to the


centerline direction of the loudspeakers. For a column of bass speakers of
size 15” or 38 cm, λ=4D corresponds to a frequency of about 225 Hz while
the shortest wavelength, λ = 0.25D corresponds to a frequency of 3570 Hz.

132
Example of a line ar-
ray loudspeaker collection
which is ceiling-mounted
in an auditorium. It points
generally at the audience
and spreads the sound per-
pendicular to the array.

133
Directionality of Loudspeakers

While loudspeakers cannot achieve the extremes of directionality of mi-


crophones, efforts to control the directionality of sound from loudspeakers
can be productive. Horn type loudspeakers are generally more directional
than open cone-type loudspeakers. Speakers in a line array can be arranged
to spread the sound in the horizontal plane more than the vertical plane
to direct the sound energy more at the listeners. Making the loudspeak-
ers more directional can increase the potential acoustic gain that can be
achieved with a sound amplification system.
The example of a small loudspeaker shows that high frequencies from
loudspeakers are typically more directional than low frequencies. For sound
from a single loudspeaker, this means that the bass-treble balance will be-
come more prominent in bass as you move further off-axis. Most sound
systems use some kind of cross-over network so that the bass and treble can
be controlled separately to achieve a good balance. Often, horns for the bass
are used because they are more directional than other bass speakers and can
be combined with treble sources of comparable directionality.

134
135
Monaural and Stereo Signals

Monaural, or single source sound signals, combine all audio informa-


tion into a single channel. This is characteristic of older signals, but mon-
aural signals are also currently used in AM radio and in many large audi-
torium sound systems. Mono is used in AM radio because of bandwidth
limitations. For auditorium sound systems, the use of ordinary stereo is
problematic because those listeners close to one of the stereo speakers will
perceive all of the sound to be coming from that speaker, whereas the real
sound source may be center front. The resulting sound image problems are
so severe in large auditoriums that a single sound cluster is typically used
over the sound source to keep the sound image centered.
Stereo, or two discrete channel sound, gives a much more realistic sound
in a home listening room, allowing you to localize the instruments in an
orchestra for example. Discrete stereo with high channel separation is pos-
sible from any signal source which can present a left and right channel, and
can be broadcast with FM radio. Further enhancements to sound fields can
be made with surround sound.

136
Surround Sound

Surround sound is a term applied


to five channel sound (regular R & L
stereo, center front, plus left and right
rear speakers). It typically means Dol-
by Surround, the home application of
Dolby Stereo, which was introduced
into movie theaters in the 70’s. It is
a matrix encoding scheme which puts
four channels of information on two
recorded channels to be decoded into L, R, Center, and Surround upon
playback.
A popular adaptation of surround sound for home sound systems is
called Dolby Pro-logic. To obtain five discrete channels of information (in-
cluding separate signals to LB and RB speakers requires digital surround
sound.

137
Dolby Pro-Logic

Dolby Pro-Logic refers to a system for decoding Dolby Surround sig-


nals and directing them to five speakers. Receivers which are designated as
“surround” may passively separate the surround signal and send the center
channel signal equally to R and L front channels. Dolby Pro-Logic goes fur-
ther in that it extracts the center channel signal to send to the front center
speaker and actively cancels it from the L & R speakers. It attempts to steer
the signal to the correct channels, and in so doing it creates a more realistic
sound field, but it cannot overcome the limitations of the matrix encoding
scheme. The L to R channel separation is high, but between center and left
or right the separation is limited to about 3 dB. The separation between sur-
round and left or right is similarly limited. All surround speakers receive
the same signal. Precise sound localization requires 15 to 20 dB of channel
separation (Riggs). More precise localization is obtainable from digital sur-
round systems. One such system is called AC-3.

138
AC-3 Digital Surround Sound

AC-3 is the name given to an en-


coding scheme for a digital surround
system which is called a 5.1 channel
system (five discrete channels plus a
subwoofer output). The digital ap-
proach permits new ways of deal-
ing with the limited available band-
width. Since most of the bandwidth
of video/audio media must go to the
video portion, the audio signals must be squeezed into the remaining lim-
ited bandwidth. Squeezing the additional discrete channels into the digitally
recorded signal is aided by what is called perceptual encoding, making use
of the fact that some sounds are inaudible and others are masked by louder
ones; these signals can be simply removed by the encoder to make room for
more important sounds.
AC-3 also allocates bits between channels of the discrete system to shift

139
signal-handling capability to the channel with the greatest current demand.
AC-3 was originally developed for HDTV. AC stands for Audio Coding
and 3 is the generation of the design. The designation “Dolby digital” is
sometimes used as a name for this system.

140
Dolby Signal Processing

Dolby Stereo is the name given to the four-channel surround sound de-
veloped by Dolby Laboratories and introduced into movie theaters in the
70’s. It employed a matrix encoding scheme called Dolby Surround which
recorded four channels of information on two channels. The two channels
are decoded into L, R, Center and Surround upon playback. The center
channel is recorded identically on the left and right channels.
Riggs, Michael, Digital Surround Comes Home, Stereo Review, May
1995 p 62
Ranada, David, “Inside Dolby Digital”, Stereo Review 61, Oct 96 p81-84.
Digital Surround Sound
Digital surround refers to surround sound systems which employ dis-
crete digital recordings of five channels of sound information. Digital sur-
round sound has been introduced into movie theaters in a form called
Dolby Stereo Digital. At the heart of Dolby Stereo Digital is and encoding
scheme called AC-3. The AC-3 based systems are now often referred to as
just “Dolby digital” in the consumer market. It’s hard to tell which is chang-
ing faster: the technology or the terminology.
Dolby Stereo Digital uses a digital data stream running at 320 kilobits

141
per second. The HDTV and laserdisc version of Dolby Surround AC-3 Digi-
tal runs at 384 kilobits per second and dynamically allocates the bits to
the channel with the most demanding signal. Use is made of perceptual
encoding to decide which parts of the audio signal would not be heard and
therefore can be eliminated. The system provides a slight delay in the center
channel sound to achieve a more realistic experience of the sounds arriving
at the listeners location from the other speakers.
Riggs, Michael, Digital Surround Comes Home, Stereo Review, May
1995 p 62
Ranada, David, “Inside Dolby Digital”, Stereo Review 61, Oct 96 p81-84.

142
Perceptual Encoding for Digital Sound

Perceptual encoding refers to systems which dynamically determine the


number of bits of data given to a given channel of audio information based
on judgements of its importance to the sound perceived by the listener.
Sounds which are below the audibility threshold for the human ear should
not waste bits which could be devoted to a higher fidelity reproduction of
an important sound. Also, certain sounds are masked by others, and if it is
judged that a certain sound would be masked anyway, why not give those
bits to another sound which would be heard?
Perceptual encoders divide the sound into frequency bands and deter-
mine which bands contain essential audio information, based on rules for
audibility and masking. It can also deal with the signal just before and just
after a given point in the recording since the temporal environment also af-
fects human perception of sound.
Perceptual encoding is used in the AC-3 system of Digital Surround
sound.
Riggs, Michael, Digital Surround Comes Home, Stereo Review, May
1995 p 62
Ranada, David, “Inside Dolby Digital”, Stereo Review 61, Oct 96 p81-84.

143
MP3 Digital Sound

MP3 stands for MPEG 1 Layer 3. MPEG is a compression type for digi-
tal data. MP3 is a variation of this format that allows sound files to be
compressed by 90 percent without major degradation of the quality of the
sound. The compressed audio file takes up so much less storage space than
on a regular compact disc or tape that it has become very convenient for
transfer on the Internet.
The real possibility for sound compression without audible loss comes
from the fact that the sampling for CDs contains far more than the nec-
essary data. Sixteen-bit digital sampling at 44.1 kHz gives you a stagger-
ing amount of information. From the audio CD you get about 1.4 million
bits per second of information, much more information than your ears can
process. To create the MP3 signal, the information from the CD format is
divided into frequency subbands. Then the signal in each subband is exam-
ined in the process of encoding to decide how many bits to allocate to it. The
process employs a “psychoacoustic model” to decide which subbands will be
recorded most accurately and which will be discriminated against. The idea
is that only that which can realistically be heard by the ear is kept.
The favorite visual metaphor is the “polar bear in the snow storm”.
Against a dark mountain on a clear day, you would have to paint the polar

144
bear with great definition. But if the polar bear is in a snowstorm, you don’t
have to provide as much detail, because you are not going to see much detail
anyway. By analogy, if a sound in a particular subband is going to be masked
out by other subbands so that you won’t hear it anyway, you might as well
save the bits you were going to use to record it. The “psychoacoustic model”
makes judgements about which sounds were going to be masked out.
Some model is applied in the encoding of the high-resolution digital
sound image to MP3, and that model is inevitably going to take out some
audible information. You can improve the model by encoding at a higher bit
rate, because you are putting in more information. Typical current bit rates
are 128, 160, 192, 256 and 320 kbps. Tests show that the accuracy increases
significantly up to 256 kbps with some current decoders, so 256 kbps is
perhaps a good comparison standard. At 256,000 bits of information per
second, you have reduced the 1.4 Mbps to about 18% - compression by more
than five to one. Of course you can get ten to one at 128 kbps, but you can’t
expect to get it without noticeable loss of sound quality.

145
Chapter 5
Simplified model of sound system
Simplified Model:
Sound Reinforcement

Assume*:
The loudspeaker provides more sound to the listener than would other-
wise have been received, but it also produces sound at the location of the
microphone. This feedback to the microphone limits the amount of ampli-
fication, which can be used. Control of the feedback generally is the deter-
mining factor for the potential acoustic gain that can be achieved by a sound
reinforcement system.
The microphone creates an electrical image of the sound, which is ampli-
fied and used to drive a loudspeaker.

148
Inverse Square Law Assumption

In the development of a simplified model for a sound amplification sys-


tem, a starting assumption is that the sound drops off according to the in-
verse square law. Of course this is not true in the real auditorium because
one of the primary goals in auditorium acoustics is to overcome the inverse
square law with natural reverberation. Nevertheless, this assumption per-
mits the calculation of an important limiting case so that the reverberation
effects in real auditoriums can be assessed by comparison.

149
Omnidirectional Assumption

In the development of a simplified model for a sound amplification sys-


tem, a starting assumption is that the microphones and loudspeakers are
omnidirectional, i.e., their response is equal in all directions. Of course no
real microphones or loudspeakers are truly omnidirectional, but this as-
sumption permits the calculation of an important limiting case. Under this
assumption, the potential acoustic gain of an amplification system can be
calculated geometrically. Such a model forms a good base for assessing the
improvements which can be made with directional microphones and direc-
tional speakers.

150
Numerical Example:
Need for Amplification

By the inverse square law, a doubling of distance will drop the sound
intensity to 1/4, corresponding to a drop of 6 decibels. Note that in the table
at right, the distance is doubled in each successive step.

By the inverse Level Distance The following sim-


square law, a doubling dB ft plifying assumptions
of distance will drop 80 2 allow us to calculate the
the sound intensity to 74 4 sound levels and assess
67 8
1/4, corresponding to the need for amplifica-
62 16
a drop of 6 decibels. tion:
56 32
Note that in the table 50 64 1. The sound drops
at right, the distance 44 128 off according to the In-
is doubled in each verse square law.
successive step. 2. The microphones
and loudspeakers are
omnidirectional.

151
152
Numerical Example:
Need for Amplification

Level Distance
dB ft
80 2 Starting with 80 dB at two feet and using the
74 4 fact that every doubling of distance will drop the
68 8
level by 6 dB, we learn that a listener at 128 ft
62 16
would receive a sound intensity of only 44 dB!
56 32
50 64
44 128

153
Maximum Amplification Condition

When the feedback to the microphone is equal to the original sound


input level, the gain can no longer be increased without an uncontrolled
increase in output, usually in the form of a loud ring at one frequency.
Practically, the sound should be kept about 3 dB below that point. If 80 dB
is the input level to the microphone, then 80 dB feedback return is an upper
bound. Using the inverse square law, the resulting levels are as shown.

154
The Limitation of Feedback

If you speak one word at the microphone at a level of 80 dB and the


loudspeaker returns that word to the microphone at 80 dB, then you can go
home. The sound system will repeat the sound all day, “chasing its tail” from
microphone to loudspeaker.
The situation depicted here is the theoretical maximum gain where the
feedback signal is equal to the input signal. It is not practical to get this kind
of gain, so you seek to stay considerably below this - at least 3dB is a com-
mon rule. Any time the feedback signal is comparable to the input signal, it
represents a distortion or degradation of the signal. The amplified signal is
coming back to the microphone with a delay and with whatever “coloring”
the sound system and the room give to it.

155
“Ringing the System”

When the gain on a sound amplification system is turned too high, the out-
put from the loudspeaker changes to an unpleasant, loud, usually high-pitched
sound. This is the result of too much feedback, but instead of reproducing
the sound being amplified, it usually produces a single pitch at the frequency
which is amplified the most by the sound system/room combination.

Note that an ideal sound system responds equally to all frequencies. This
not only gives a high fidelity reproduction of the sound, it also gives a higher
potential acoustic gain from the amplifier system. The horizontal dashed line
which represents the ideal system on the right above is still well short of the
feedback level when the other system begins to ring. The procedure for filtering
the frequency response to approach the ideal flat response is called equalization.

156
Increasing Potential Acoustic Gain

A number of steps can be taken to optimize the potential acoustic gain


of a sound reinforcement system. This gain is limited by the feedback condi-
tion. Some of these measures are strictly geometrical and can be modeled
from a simplified amplification system. Others involve more technical ap-
proaches.
Usable amplification can be increased by the geometrical factors:
1. Moving the loudspeaker further from the microphone
2. Moving the loudspeaker closer to the listener
3. Moving the source closer to the microphone

and by more technical means such as:


4. Using more directional microphones
5. Using more directional loudspeakers
6. Use of notch filters (feedback suppressors)
7. Equalizing the sound system

Moving Loudspeaker Farther from Microphone

157
The first practical step which is usually taken when a portable sound
system rings from feedback is to move the loudspeaker farther from the mi-
crophone. The amount of anticipated improvement in the potential acoustic
gain can be modeled for the simplified amplification system. In a real audi-
torium, you cannot achieve as much improvement as that modeled amount
because of reverberation. In general, it does no good to move the loud-
speaker out past the critical distance at which the reverberant sound field
contributes as much to feedback as the direct sound field. As a practical
measure, this critical distance for the speakers must be determined by ex-
periment, moving the speakers farther out until you can no longer increase
the gain before feedback.

158
Critical Distance
for Speaker Placement

You can get more amplification of sound without the annoying ringing
by moving the speakers further from the microphone. However, this has
practical limits.
The direct sound field from a point source in an auditorium drops off
according to the inverse square law. To the extent that the speaker can be
considered to be a point source, then the feedback from that speaker to the
microphone is decreased by moving the speaker further away.

159
The reverberant sound field, on the other hand, more or less fills the en-
tire room and the contribution of the loudspeaker to the reverberation does
not decrease as you move the speaker further out from the microphone.
The critical distance is defined as the distance at which the reverberant
sound is equal in intensity to the direct sound. At distances greater than the
critical distance, the reverberant sound is dominant, and you get no further
increases in potential acoustic gain by moving the speakers further out.

160
Move Loudspeaker Closer to Listener

Moving the loudspeaker closer to the listener without changing the dis-
tance from the microphone will increase the available amplification. In most
practical applications, this means adding extra speakers which are closer to
the listener. However, this creates sound image problems, and the use of
digital delay of the signal to those extra speakers is recommended.

161
Move Loudspeaker Closer to Listener

One of the obvious ways to get more amplified sound to the listener is
to move the loudspeaker closer to the listener. The amount of anticipated
improvement in the potential acoustic gain can be modeled for the simpli-
fied amplification system. As a practical matter in larger auditoriums, this
means using additional speakers which are closer to the listener to add to
the sound from a main speaker cluster. A problem which arises is that the
signal from the amplifier to the distant speaker travels at the speed of light
whereas the direct sound from the source travels at the speed of sound. A
sound image problem results from the fact that the sound from the nearby
speaker reaches the listener before the sound from the visible source in the
front of the auditorium - your ear locates a sound partly by time of arrival
and therefore hears it coming from the speaker. The location conflict be-
tween your ears and eyes can be disconcerting. This is typically overcome
by using a digital delay for the sound signal going to the distant speakers.

162
Use of Digital Delay

To maintain the perception that the sound is coming from the front of
the auditorium, it is necessary to use digital delay to speakers under balco-
nies, etc., where they are much closer to the listener than the main speakers.
The signal to the speaker from the microphone travels at the speed of light,
and the sound to the listener would arrive first from the closest speaker.
Precedence has a strong localizing influence, and all the sound would seem
to be coming from the nearby speaker. With appropriate delays, the sound
to all listeners seems to come from the main speaker.

163
Equalization

One of the most powerful tools for increasing the potential acoustic gain
of a sound amplification system is the production of a “mirror image” filter
to level the frequency response of the system. The process of equalizing an
auditorium also improves the fidelity of the sound. The process of leveling
out the frequency response of the sound system removes the peaks which
will ring the system before sufficient gain is achieved.

164
White Noise

For processes of testing and equalizing rooms and auditoriums, it is


convenient to have broad-band noise signals. Typically, white noise or pink
noise is used.
White noise is noise whose amplitude is constant throughout the audible
frequency range. It is described as “white” by analogy to white light, which
is a mixture of all visible wavelengths of light. It is fairly easy to produce
white noise - it is often produced by a random noise generator in which all
frequencies are equally probable. The illustration above includes the entire
standard audio frequency range. The sound of white noise is similar to the

165
sound of steam escaping from an overheated steam boiler. The ear is aware
of a lot of high frequency sound in white noise since the ear is more sensi-
tive to high frequencies. Since each successive octave of frequency will have
twice as many Hz in its range, the power in white noise will increase by
a factor of two for each octave band. Twice the power corresponds to a 3
decibel increase, so white noise is said to increase 3 dB per octave in power.
Representing the differences between white and pink noise in dB
makes the difference seem less drastic.

166
Pink Noise

Pink noise, rather than white noise, is often the choice for testing and
equalizing rooms and auditoriums. Broad-band noise signals are desirable
for such testing.
Whereas white noise is defined as sound with equal power per Hz in fre-
quency, pink noise is filtered to give equal power per octave or equal power
per 1/3 octave. Since the number of Hz in each successive octave increases
by two, this means the power of pink noise per Hz of bandwidth decreases
by a factor of two or 3 decibels per octave.
Since pink noise has relatively more bass than white noise, it sounds
more like the roar of a waterfall than like the higher hissing sound of white
noise.
When pink noise is chosen for equalizing auditoriums, real-time analyz-
ers can be set up so that they display a straight horizontal line when they
receive pink noise. With pink noise input to the sound system, the response

167
curve can be adjusted to produce pink noise in the auditorium as measured
by the real-time analyzer. This provides optimum fidelity as well as increases
the potential acoustic gain of the sound amplification system.

With pink noise, the intensity is filtered to drop 30 dB over the 10 octave
audible frequency range.
The difference between pink noise and white noise is exaggerated in the
top illustration by the process of making the vertical axis linear with in-
tensity. The reason is that the ear is definitely not linear in its response to
sound. The sound intensity of pink noise drops by a factor of 1000 over the
audible frequency range, and that sounds very drastic. That drop should
be considered in light of the “rule of thumb” for loudness perception: the
fact that dropping the sound intensity by a factor of 10 or 10dB results in a
sound that is perceived to be half as loud to the human ear. If each 10dB of
drop results in a sound half as loud, then a 30dB drop will result in a sound
perceived as 1/8 as loud - significantly less, to be sure, but not as drastic as
the factor of 1/1000 in intensity would imply.

168
Glossary :
Inverse Square Law :
The sound intensity from a point source of sound will obey the inverse
square law if there are no reflections or reverberation.
Reverberation :
It is the collection of reflected sounds from the surfaces in an enclosure
like an auditorium.
Sound Synthesis :
Periodic electric signals can be converted into sound by amplifying them
and driving a loudspeaker with them.
Faraday’s Law:
Any change in the magnetic environment of a coil of wire will cause a
voltage (emf) to be “induced” in the coil.

Binary Coded Decimal


One of the most widely used representations of numerical data is the bi-
nary coded decimal (BCD) form in which each integer of a decimal number
is represented by a 4-bit binary number

MIDI for Music


Musical Instrument Digital Interface (MIDI) is a data transfer protocol
which is widely used with music synthesizers.

169
Atmospheric Pressure
The surface of the earth is at the bottom of an atmospheric sea. The
standard atmospheric pressure is measured in various units:

1 atmosphere = 760 mmHg = 29.92 inHg = 14.7 lb/in2 = 101.3 KPa

Threshold of Hearing
Sound level measurements in decibels are generally referenced to a
standard threshold of hearing at 1000 Hz for the human ear which can be
stated in terms of sound intensity:
Sensitivity of Human Ear
The human ear can respond to minute pressure variations in the air if
they are in the audible frequency range, roughly 20 Hz - 20 kHz.
Audible Sound
Usually “sound” is used to mean sound which can be perceived by the
human ear, i.e., “sound” refers to audible sound unless otherwise classified.
Transverse Waves
For transverse waves the displacement of the medium is perpendicular
to the direction of propagation of the wave.
Longitudinal Waves
In longitudinal waves the displacement of the medium is parallel to the
propagation of the wave. A wave in a “slinky” is a good visualization.

170
Fourier Analysis and Synthesis
The mathematician Fourier proved that any continuous function could
be produced as an infinite sum of sine and cosine waves.
Fast Fourier Transforms
Fourier analysis of a periodic function refers to the extraction of the
series of sines and cosines which when superimposed will reproduce the
function.
Musical Intervals
The term musical interval refers to a step up or down in pitch which is
specified by the ratio of the frequencies involved.
Period: the time required to complete a full cycle, T in seconds/cycle
Frequency: the number of cycles per second, f in 1/seconds or Hertz
(Hz)
Amplitude :the maximum displacement from equilibrium A
FM Stereo Broadcast Band
The bandwidth assigned to each FM station is sufficiently wide to broad-
cast high-fidelity, stereo signals.

171
About the author

Dr. Ibrahim Elnoshokaty


born May 1977 Member the Egyptian acoustical to make the code to the
foundations of the design and execution of the terms acoustics and noise
control for buildings
Member of acoustical society of America Funder acoustical society of
Egypt engineering faculty ASU. Ibrahim elnoshokaty hold PhD from Geor-
gia state university2006 (electro acoustics) and he start his career after fin-
ishing BA major sound engineering 1999. start as sound engineer in cinema
production and he did some of big movies like (al-Sadat days, sahar elayaly).
he did first commercial radio in Egypt as sound department manager 2003
and it was booming and he active first RDS in Arab world . work as project
manager in melody radio network ( lebanon , syria , jordan) and it was con-

172
nected network by RDS alternative frequency, during that he hold master
degree from Georgia state university2003, after holding master degree he
became technical director for melody tv and did some of famous crystal
clear screen like melody drama , melody tuns , melody aflam , melody clas-
sic , melody trix and thin moved to modern sport and did first free to air
sport channel and it was channel no 1 in egypt 2007 as CTO , during that
he did his Owen business enoshmink technology 2005 it was very small
company till 2007 GN4me starts it promising project that spread 400 cin-
ema screen in 4 years and that project was at enoshmink tech hand by dr.
ibrahim elnoshokaty effort in that field by customize and R and D of sound
isolation material and did real time equalization for the auditorium that
made the experience of watching movie is good for the audience and did
first rear projection in egypt ( deep mail alex ) and did first auditorium at
elnajaf elasharf iraq and his company work in sound innovation like sound
treatment painting and it approved by admin capital city elmasa hotel and
it work to reduce reverb from the dome hall second enocrso that system
is monitoring and management system for public address system sharedin
is online plug and play radio and social media posting platform and the
awared he take .
During my PhD I obtained the following Certificates 2006 :
He holds a certificate from Georgia state University in U.S.A is awarded
of Excellence.
He holds a certificate from Georgia state University in U.S.A is awarded
of Distinction.
He holds a Bachelor of sound Engineering 1999.

173
HONOURS AND ACTVITIES
Appreciation Certificate:
Member of acoustical society of America.
Member of acoustical society of Egypt.
Member of international society of physics
Appreciation Certificate from Housing & Building National Research
Center 2013
Appreciation Certificate from Cofermetal 2012.
Appreciation Certificate from Armstrong 2011
Appreciation Certificate from Palestine Radio 2009
Appreciation Certificate from Egyptian Radio and Television Union 2007
Appreciation Certificate from Nugoom F.M 2007
Appreciation Certificate from Spin F.M 2007
Appreciation Certificate from Sout Elmadena 2007
Appreciation Certificate from Ecreso(fm transmitter manufactory) 2008
Appreciation Certificate from Modern sport 2005
Appreciation Certificate from Moldy 2002

174

You might also like