Rule-Based Synthesis
LING 285
Spring 2021
Mary Byram Washburn
Rule Based
Synthesis
Physical Properties:
- Harmonics
-
- f0
- Decrease in
amplitude
-
- Formants
-
- F1 and F2 the
loudest
• Rule Based Speech Synthesis:
Rule-Based •
• Imitation of the physical properties of speech, using something other
Synthesis •
than the human voice
• Synthesize the acoustics of speech
•
• Harmonics
•
• f0
• Decrease in amplitude
• Formants
•
• F1 and F2
Pure Tone =
Simple Signal
• Lots of Pure Tones
Pure Tone •
Synthesis •
•
One for each:
• Harmonic
•
• f0
• Decrease in amplitude
• Formant
•
• F1 and F2 the loudest
•
5000
4000
3000
2000
1000
0 Hz
Vowel
For Vowels:
Acoustics
•
• 2 tones (F1 and F2)
2500
Hz
300 Hz
Rule-Based
Synthesis
Rule-Based
Synthesis
Synthesized /æ/
F1: high 800Hz
F2: high
1800Hz
f0: female 200Hz
Rule-Based
natural /æ/
Synthesis
synthesized /æ/
Stop- involves 2
Consonant
gestures-
Acoustics
closure and
release
Kick Pip
/kɪk/ /pɪp/
Fricatives-
just 1
gesture
Fish Sis
/fɪʃ/ /sɪs/
Mesh
5000 Hz
Consonant
Acoustics:
Fricatives
mesh, mess?
Mess
5000 Hz
Synthesizing Consonants
Rule-Based •
Synthesis •
•
Synthesize the acoustics of speech
• Stops:
•
• Silence during closure gesture
• A lot of high energy when there’s aspiration on the release gesture
• Fricatives
•
• Energy is high and long
• Sibilants are louder than other fricatives
•
• /s/: starts >4000 Hz
• /ʃ/: starts ~3000 Hz
• Nasals
•
• Nasal Murmur: loud at ~250 Hz
•
Consonant pam
Acoustics
nap, pam, pop?
pop
nap
3412, 3512, 3612, 3712
Consonant
Acoustics: [z]
Voicing
[s]
Synthesizing Consonants
Rule-Based •
Synthesis •
•
Synthesize the acoustics of speech
Voicing
Voiced: harmonics
• Frequencies at intervals of the f0
Unvoiced: noise
• Frequencies at random intervals
• do not have harmonics
•
hiss his
/hɪs/ /hɪz/
Consonant
Acoustics:
Voicing
/eib/ /eip/
Synthesizing Consonants
Rule-Based •
Synthesis •
•
Synthesize the acoustics of speech
Voicing
Voiced: harmonics
• Frequencies at intervals of the f0
• Long preceding vowel, Short consonant
Unvoiced: noise
• Frequencies at random intervals
• Short preceding vowel, Long consonant
•
Consonant
Acoustics:
Liege or liege (voiced)
Voicing
Leash?
/liʒ/ /liʃ/
leash (unvoiced)
Rule-Based [s]
Synthesis in 5000 Hz
5200 Hz
Praat 6000 HZ
[ʃ]
3000 Hz
5000 Hz
5200 Hz
• Rule Based Speech Synthesis:
Rule-Based •
• Imitation of the physical properties of speech, using something other
Synthesis •
than the human voice
Formant Based Synthesis
•
• Synthesize the acoustics of speech
•
• Harmonics
•
• f0
• Decrease in amplitude
• Formants
•
• F1 and F2
•
•
Stops:
Rule-Based
•
•
Silence during closure gesture
A lot of high energy when there’s aspiration on the release gesture
Synthesis:
•
•
Fricatives
Consonants •
•
Energy is high and long
•
Sibilants are louder than other fricatives
•
•
/s/: starts >4000 Hz
•
/ʃ/: starts ~3000 Hz
•
Nasals
•
• Nasal Murmur: loud at ~250 Hz
Voicing
•
Voiced: harmonics
•
• Frequencies at intervals of the f0
•
Long preceding vowel, Short consonant
•
Unvoiced: noise
•
•
Frequencies at random intervals
•