Proceedings of the National Academy of Sciences of the United States of America,
1939, vol. 25, no. 7 (Jul. 15, 1939), pp. 377-383
VOL. 25, 1939 PHYSICS: H. DUDLEY 377
Writing
=
41(W) -<3 3(w)(11)
(W)
(an analytic functioin of w), we have by (9) that *'(w) 0. Therefore
~
A,(w) constant, and by allowing w to be real in (11), we see that this con-
stant must be pure imaginary:
<3(W) = @3(W) + iC.
Hence
9iEp3(w)= ~?S(W)WR)O3(W).
= (12)
The equations
X = 991(w), y = 9%P2(W), Z= MP3(W) (13)
define the required minimal surface 9YIcoinciding with M for v > 0. We
see by (5) and (12) that the law of prolongation of M11is
x(u, -v) = -x(u, v), y(u, -zV) = -y(u, v), z(u, -v) = z(u, v) (14)
which represents a symmetry in the line 1, as stated in our theorem.
I "The Analytic Prolongation of a Minimal Surface over a Rectilinear Segment of
Its Boundary," Duke Math. Jour., 5, 21-29 (March, 1939).
We may take this occasion to observe that in the statement, loc. cit., p. 24, lines 12-
16, the difference quotient must be interpreted as a simultaneous difference quotient:
{f(u + h) - f(u + k)}/(h - k), whose limit must be taken as h, k -O 0 together. The
proof given of the existence of Z. on ab is easily adjusted to this interpretation.
2 The exact meaning of this is discussed in the paragraph following formula (3).
3 See the reference in footnote 2.
'The slit is along the u-axis from b (inclusive) to + co and from a (inclusive) to-
TIHE A UTOMA TIC SYNTHESIS OF SPEECHI
By HOMER DUDLEY
BELL TELEPHONE LABORATORIES
Read before the Academy, April 24, 1939
The synthesis of speech has long appealed to the mind of man. There
is a detailed report by Von Kempelen' of his experiments with mechanical
models of speaking machines around 1780. Since then many famous
names including Wheatstone and Helmholtz have been associated with
some form of speech synthesis. The automatic synthesis2 of speech de-
scribed in the present paper has been made possible by the powerful tech-
This content downloaded from 130.236.82.7 on Wed, 04 Nov 2015 12:01:09 UTC
All use subject to JSTOR Terms and Conditions
378 PHYSICS: H. DUDLEY PROC. N. A. S.
niques associated with the developments in communication circuits and
apparatus in recent years.
This automatic synthesis of speech is carried out in two steps, the re-
making of the speech by an electrical synthesizing circuit being preceded
by an electrical analysis of the speech it is desired to remake. The over-
all process thus uses electrical circuits to copy a man repeating speech as
he hears it. By analogy the analyzer serves for an artificial ear and the
synthesizer for an artificial vocal system. Actually the synthesizer was
designed to be a somewhat simplified equivalent of the human vocal sys-
tem in the essential steps of speech production whereas the analyzer was
designed on the basis of working satisfactorily with the synthesizer rather
than as an artificial equivalent of the ear. In view of this line of develop-
ment the broad features of the synthesizer will be touched upon before
giving a more detailed discussion of the analyzing method used.
The synthesizer fashions its synthetic speech out of two basic sound
streams, namely, a buzzer-like tone from a relaxation oscillator which for
brevity will hereafter be referred to as a buzz and a noise from random
vibrations in the electrical current in a gas tube which will hereafter be
referred to as a hiss from the nature of its sound. The speech sounds in
general can be fairly well simulated with one or the other of these basic
streams of sound as the raw material. The buzz consists of a fundamental
frequency and its harmonics and is employed for making the voiced
sounds, while the hiss has a random nature corresponding to the presence
of components at all vibration rates in the audible range used here and is
employed for the unvoiced sounds. In fashioning the synthetic speech
sounds from these sound streams, two modulating or control processes
are applied. The fundamental frequency of the buzz is varied to simu-
late the pitch of voiced speech with its inflection while the spectral varia-
tion of amplitude with frequency for the various speech sounds, whether
voiced or unvoiced, is simulated by controlling the relative amounts of
power transmitted in fixed frequency bands. With this general procedure
for constructing synthetic speech as a background the more important
details of the analyzing and synthesizing operation will be filled in.
The Speech Analyzer.-Figure 1 shows the overall circuit for remaking
speech with the analyzer at the left and the synthesizer at the right.
Electrical speech waves from the microphone are analyzed for pitch by the
top channel and for spectrum by a group of channels at the bottom.
In the pitch analysis the fundamental frequency, which for simplicity
will be called the pitch, is measured by a circuit containing a frequency
discriminating network for obtaining the fundamental frequency in
reasonably pure form, a frequency meter for counting by more or less
uniform pulses the current reversals therein and a filter for eliminating
the actual speech frequencies but retaining a slowly changing current
This content downloaded from 130.236.82.7 on Wed, 04 Nov 2015 12:01:09 UTC
All use subject to JSTOR Terms and Conditions
VOL. 25, 1939 PHYSICS: H. DUDLEY 379
that is a direct measure of the pitch. Unvoiced sounds, whether in
whispering or the unvoiced sounds of normal speech, have insufficient
power to operate the frequency meter. The output current of the pitch
channel is then a pitch-defining signal with its current approximately
proportional to the pitch of the voiced sounds and equal to zero for the un-
voiced sounds.
The analysis for spectrum is made easier by inserting the indicated
predistorting equalizer having the loss characteristic given in figure 2 to
obtain a fairly uniform amount of power in the various channels. While
ANALYZER SYNTHESIZER
F r ENERGY
D FREQEN FSOURCE PITCH
INATOR METER (O-251) CNRL
R
RLAXATION
PITCH CHANNEL OSCILLATOR
RANDOIM
SPEECH I
~~~~~~~~~~~~~~~~~~~~~~~~~
| \ | | ~~~~~~~~~~~~~~~
_y 02PREDISTORTING - RS OUT
i \EQUALIZERlI/
1 1 \A) ~SPECTRUM CHANNELS 0Ho-
| C; ---
tI~~~~~~~~~~~~~EQSUAOLRIZ
NO. 220-0
250-55Cr-- 0-25"T
I I I l l ~~~FlILTERS
MODU- l
FILTERS RECTIFIERSFILTERS j LATORS
TO 8 OTHERSPECTRUM CHANNELS COVERING
FREQUENCYRANGE 550-2950'v IN 300" BANDS
FIGURE 1
Schematic circuit for the automatic synthesis of speech.
this equalizer results in a better working of the spectrum circuits it produces
no lnet distortion in the speech characteristic because its effect is annulled
by the restoring equalizer at the end of the synthesizer. There are 10
spectrum-analyzing chalinels, the first one handling the frequency range
0-250 cycles and the other nine, the bands, 300 cycles wide, extending from
250 cycles to 2950 cycles, the top frequency being chosen as representative
of commercial telephone circuits. Each spectrum-analyzing channel con-
tains the proper band filter followed by a rectifier for measuring the
power therein and a 25-cycle low-pass filter for retaining the current indica-
tive of this power but eliminating any of the original speech frequencies.
The operation of the analyzer is illustrated in figure 3 by a group of os-
cillograms taken in analyzing the sentence "She saw Mary." To iinsure
This content downloaded from 130.236.82.7 on Wed, 04 Nov 2015 12:01:09 UTC
All use subject to JSTOR Terms and Conditions
380 PHYSICS: H. DUDLEY PROC. N. A. S.
that the same speech was analyzed in obtaining the various oscillograms,
the sentence was recorded on a high quality magnetic tape recorder and
reproductions therefrom used for the oscillograph tracings. The speech
wave input to the analyzer after passing the predistorting equalizer is
shown in the line next to the bottom while the output is shown in
the other oscillogram traces, the pitch-defining signal being below the
speech wave and the 10 spectrum-defining signals above it. For con-
venient reference the oscillograms are lined up together whereas in the
actual circuit the speech-defining signals lag about 17 milliseconds behinid
the speech input wave. While the inaudible speech-defining output sig-
nals contain all the essential speech information of the input wave, it is
noted that they are slow-changing, in this way corresponding to lip or
PREDISTORTING AND
RESTORING S
20 EQUALIZER
15 s _ I I w____
10
z ALONE
0-
5 ._ l_ __ _ _ _ _ __ _ _
0 400 800 1200 1600 2000 2400 2800 3200 3600
FREQUENCY IN CYCLES PER SECOND
FIGURE 2
Loss characteristics of the equalizers in the circuit of figure 1.
tongue motions, as contrasted with the rapid-changing speech wave itself
composed of the much higher audible vibration rates. The dropping of
the pitch to zero for the unvoiced sounds "sh" and "s" is also readily seell.
The Speech Synthesizer.-Figure 3 shows the effect of the synthesizing
process as well as of the analyzing process. In the analyzer the speech
wave is the input and the 11 speech-defining signals are the output while
in the synthesizer the 11 speech-defining signals are the input and the
speech wave the output as well as can be indicated on such a condensed
scale.
The steps in speech synthesis mentioned previously can be followed
readily in the synthesizer circuit shown at the right of figure 1. The re-
laxation oscillator is the source of the buzz and the random noise circuit
the source of the hiss. The hiss is connected in circuit for unvoiced sounds
This content downloaded from 130.236.82.7 on Wed, 04 Nov 2015 12:01:09 UTC
All use subject to JSTOR Terms and Conditions
CYCLES
PER
SECOND
2650-2950
C,r
< 2350-2650
Uco
z
Z
2050- 2350
LU.
o 1750-2050
0'I
I'_ 114S0-1450
z II ..I
w
Iw 550-850.
0.
U 250-550
U)
0- 250
SPEECH WAVE
PITCH-DEFINING
CURRENT
0 005 0.1 0.2 0.3 0.4 0.5 0.6 Q7 0.8 0.9 1.0
TIME IN SECONDS
FIGURE 3
Oscillogram of the speech wave (0-3000 cycles) and the speech-defining signals (0-25 cycles) for the sente
This content downloaded from 130.236.82.7 on Wed, 04 Nov 2015 12:01:09 UTC
All use subject to JSTOR Terms and Conditions
382 PHYSICS: H. DUDLEY PROC. N. A. S.
and for quiet intervals. (In the latter case the zero sound output from the
synthesizer results from the zero currents in the spectrum channels.)
When a voiced sound is analyzed a pitch current other than zero is re-
ceived from the analyzer with the result that the buzz is set for the cor-
rect pitch by the "pitch control" on the relaxation oscillator and at the
same time the relay marked "energy source switch" operates switching
from the hiss source to the buzz source.
The outputs from the spectrum-analyzing channels are fed to the
proper synthesizing spectrum controls with the band filters lined up to cor-
respond. The power from the synthesizer energy sources in these various
bands is then passed through modulators under the control of the spectrum-
defining currents so that the power output in each filtered band from the
synthesizer is proportional to that measured by the analyzer in the original
speech. After the restoring equalizer there is, then, synthetic speech
having approximately the same pitch and the same spectrum as the origi-
tial speech. The synthetic speech lags the original speech by about 17
milliseconds due to the inherent delay in electrical circuits of the types
used.
Application.3-It may be expected that this device will prove useful
in making speech studies, especially those of a type in which the ear is the
ultimate judge. With the speech-defining signals now made available
as independent variables, they are put under the control of the experi-
menter. Thus speech can not only be reproduced but literally remade to
new specifications. Observations may be made on the transformation of
voiced speech to whispered speech, on the independent contributions of
voiced and unvoiced sounds, and on the separate parts played by spectrum
and inflection. The association of intelligibility with spectrum and emo-
tional content with pitch change can be made strikingly apparent. The
voice pitch may be greatly decreased or greatly increased; its range may
be augmented, diminished or held to a perfect monotone; its inflections
may be inverted. Radical speech changes are possible such as splitting
the analyzing equipment to obtain hybrid voices formed with the pitch
of one talker and the spectrum of another.
As to the engineering possibilities which may grow out of the applica-
tion of the principles employed in this device, it is hard to predict at the
present time. The speech-defining currents do have features of simplicity
and inaudibility which may open the way to new types of privacy or to a
reduction in the frequency range required for the transmission of intelligible
telephonic speech.
1 Mechanismus der menschlichen Sprache nebst der Beschreibung seiner sprechenden
Maschine, 1791.
2 The synthesizing apparatus in its
earlier stages of development was demonstrated
during an address by Dr. Frank B. Jewett at the Harvard Tercentenary Conference of
This content downloaded from 130.236.82.7 on Wed, 04 Nov 2015 12:01:09 UTC
All use subject to JSTOR Terms and Conditions
VOL. 25, 1939 PHYSICS: M. J. BUERGER 383
Arts and Sciences, September 11, 1936, on "The Social Implications of Scientific Re-
search in Electrical Communication," Bell Telephone Quarterly, 15, 205-218 (1936),
and Scientific Monthly, 43, 466-476 (1936). A description of this demonstration is given
in a brief article on "Synthesizing Speech," by H. Dudley in the Bell Laboratories
Record, 15, 98-102 (1936).
3 When this paper was presented at the 1939 Spring meeting, phonograph records
were used to illustrate some of these possibilities as well as to show the important steps
in the automatic synthesis of speech.
TIIE PIIOTOGRAPHY OF INTERATOMIC DISTANCE VECTORS
AND OF CRYSTAL PATTERNS
By M. J. BUERGER
MINERALOGICAL LABORATORY, MASSACHUSETTS INSTITUTE OF TECHNOLOGY
CommunicatedJune 7, 1939
1. Introduction.-In a recent note,' W. L. Bragg has described a
method of optically accomplishilng the two-dimensional Fourier summa-
tion
12;
F f2w7hx 2irlz ho)
c +
Px,Y = :Fhol COS + - aOI} (1)
a
which gives the electron density, p, of all points, x, z, of a crystal structure,
projected on (010). In Bragg's method, a complete zero-layer reciprocal
lattice level is laid out on a thin brass plate, and at each lattice point, a
hole is drilled having an area proportional to Fhol. This perforated plate
is placed between a pair of lenses anid illuminated by a monochromatic
point source placed at the focus of the first lens. The wavelets diffracted
from the holes in the brass plate have the correct amplitudes and spacinigs
to give diffraction equivalent to the above Fourier summation, so that the
second lens gives an image of the electron density projected on (010),
i.e., a picture of the crystal structure itself. The arrangement provides a
practical method of rapidly synthesizing crystal structures from x-ray dif-
fraction data in the cases of crystals, all of whose diffraction spectra have
the same phase, i.e., ahul = 0. This situation obtains in cases of crystals
having a center of symmetry in the plane of the projected pattern, and
having sufficiently heavy atoms at these symmetry centers.
In the present note, suggestions are made for (a) extension of the method
to cases where a is either 0 or 7r,i.e., to any crystal having a projected
center of symmetry, (b) extension of the method to Patterson and Harker
summations for determining interatomic distance vectors and (c) reduc-
tion of many of the steps involved to purely photographic processes, so
This content downloaded from 130.236.82.7 on Wed, 04 Nov 2015 12:01:09 UTC
All use subject to JSTOR Terms and Conditions