SPEECH LAB I
1. Bandwidth of the speech signal
What is the sampling frequency? What is the highest frequency that can be seen in the
spectrogram? (be sure to visualize the full spectral range).
The sampling frequency is 48000Hz/second.
The highest frequency is 426 Hz (Sol#4):
2. Observe the spectrogram (pay attention to seeing the full range of frequencies). Up to
what frequency would you say that there is ‘information’ in the spectrogram? That frequency
defines the bandwidth of the signal. Let’s call that frequency fmax.
fmax= 18k Hz
3. Filter your signal with a low‐pass filter with a cut‐off frequency equal to fmax (if necessary,
see how to filter in section 5.1). Listen to your filtered signal. Can you notice the difference?
SPEECH LAB I
Se oye más bajo y más plano.
4. Further filter the signal (you can further filter the already filtered signal or start from the
original signal) now using a cut‐off frequency of 8000 Hz. Can you notice the differences with
the previous signals? This frequency of 8000 Hz is a very common bandwidth used in
speech processing tasks (such as speech synthesis, speech recognition etc.). What is the
minimum sampling frequency to be used with a signal with that bandwidth of 8000 Hz?
Bandwidth = Fmax = 1/2Fs
Bandwidth → 8000Hz
Fs → 16000 samples/second
For the following sections, you can continue using the previous high quality signal (the
original) or you can use one of your own. In this last case, consider that the signal should
offer good quality up to 8000 Hz. Looking at the spectrogram and listening to the signal:
5. What are the sounds(phonemes) that occupy the maximum bandwidth? What kind or
kinds of sounds reach the highest frequencies in your sentence?
Las vocales dentro de las sílabas tónicas.
Use a low‐passfilter to low‐passfilter yoursignal to a bandwidth of 3600Hz. Save this signal
for a next step. Listen to the differences between the original sentence and that one filtered
at 3600 Hz.
6. Which are the sounds that have been affected the most?
Los sonidos que ocupaban la máxima bandwith, es decir, las vocales dentro de las sílabas
tónicas.
Low‐passfilter the signal using decreasing cutoff frequency value, i.e., progressively reduce
the bandwidth of the signal (for example starting from the previous signal filtered at 3600
kHz using steps of 400 Hz. For each filtering step, listen carefully and observe how some
sounds cannot be identified anymore due to the filtering.
7. Approximately, which cut‐off frequency makes the sentence unintelligible? (It is better to
check this with someone who does not know the contents of the sentence).
SPEECH LAB I
3600 → se entiende
3200 →
2800
2400
2000
1600
1200
800
400
Cut-off between 50-5hz
8. Take the signal filtered at 3600 Hz and now use a ‘high pass filter’ with cut‐off frequency
300 Hz to filter that signal. Can you still identify the message? Can you also identify the
speaker? Comment on this process
Yes
Yes
PART II
Waveform, spectrogram and spectrum of voiced and unvoiced sounds.
To do this exercise you can use your own recording or a provided signal of your choice. In
either case, be sure that the sentence is phonetically rich (i.e., it contains a variety of
sounds). You can find some examples in section 5.4.
Locate in the signal voiced and unvoiced segments. Use zoom to see the signal properly,
showing approximately one syllable. Note the differences in the waveform and in the
spectrogram of both types of sounds.
1. What differences can you observe in time (waveform) and in frequency (spectrogram)?
Show at least one example each (voiced/unvoiced waveform and spectrogram). Compare
also the spectrum of a voiced sound with the one of an unvoiced sound. To do this, plot the
‘Spectrum’ of a characteristic segment (voiced/unvoiced) (see section 5.2 if you need help to
plot the spectrum). To obtain the spectrum use the parameters for a ‘narrowband’
spectrogram.
2. What are the differences between the spectrum of voiced and unvoiced sounds? Explain
them with detail and show one example.
3. Why should we use the parameters of a narrowband spectrogram to see voicing in the
spectrum and not those of a broadband spectrogram?
SPEECH LAB I
4. How can you measure the fundamental frequency on the waveform, spectrogram and
spectrum? Show one example and give the obtained values for the fundamental frequency
and period.
5. Using a long window can be disadvantageous for some specific tasks. In which cases can
be problematic using a long window? i.e., if you can choose between a window of length 3‐4
periods or a window of 10‐20 periods, when may it be of interest to use a shorter window?
PARTE III
Formants
In this exercise, you will obtain the values of the first two formants of (at least) 5 vowels. You
can use your own recordings. For example, you can record the vowels in your mother
tongue. Be sure that you record stable vowels to make your measurements easier. If your
selected language has more than 5 five vowels, the number of vowels to measure is up to
you.
Alternatively, you can download the English vowels from this link:
https://linguistics.ucla.edu/people/hayes/103/Charts/VChart/#TheVowels.
1. Select stable or central points inside each vowel and measure the first two formants and
its bandwidths. It is convenient to observe the energy/intensity curve together with the
spectrogram, to be sure that a point with sufficient energy has been chosen. Use the feature
‘Formant tracks’ to help you find the formant value. Fill in Table 1 with the values obtained
for your recording (add new vowels or change them if needed). Show some graphics with
your measures.
Resultados obtenidos con Praat:
VOWEL F1 BW1 F2 BW2
a (front) 875hz 194hz 11427Hz 180hz
e 408Hz 80.5Hz 2229hz 179Hz
i 209Hz 67.129Hz 2598Hz 325Hz
o 418Hz 247Hz 711Hz 345Hz
u 301Hz 22.5Hz 753Hz 44.5Hz
SPEECH LAB I
2. Represent the values of the formants in a graphic of your choice (F1 vs F2, vowel vs
frequency for F1 & F2… ).
SPEECH LAB I
3. Compare your obtained standard values with standard values (research about the
standard values of the formants in the chosen language). Do they agree?
I took as reference this chart from ucla (I used their audio files of vowels):
VOWEL F1 BW1 F2 BW2
a (front) 810Hz 194hz 1650Hz 180hz
Lower Low (875hz) (1427Hz)
Central
Unrounded
e [e] 425 Hz 80.5Hz 2250Hz 179Hz
Upper mid (408Hz) (2229hz)
Front
Unrounded
i [i] 320HZ (209Hz) 67.129Hz 2350Hz 325Hz
Upper high (2598Hz)
Front
Unrounded
o[o] 390Hz 247Hz 710Hz 345Hz
Upper mid (418Hz) (711Hz)
Back
Rounded
u[u] 290Hz 22.5Hz 730Hz 44.5Hz
Upper High (301Hz) (753Hz)
Back Rounded
SPEECH LAB I
The values I got are in red color. The differences are not big, so I can assume that the
reason for these differences is measurement caused.