FFT analysis in practice
Perception & Multimedia Computing
Lecture 13
Rebecca Fiebrink
Lecturer, Department of Computing
Goldsmiths, University of London
1
Last Week
• Review of complex numbers: rectangular and
polar representations
• The complex exponential
• The Fourier Series and Discrete Fourier
Transform (DFT)
• The Fast Fourier Transform (FFT)
• Lab:
• Understanding convolution and systems
through hands-on practice
• Signals and convolution in R
2
Today
• Brief lab discussion (more tomorrow)
• Brief coursework discussion
• Using FFT in practice
• Choosing parameters and interpreting output
• Short-time Fourier Transform
• Example applications
• Variants of Fourier transform
3
Lab discussion
• Convolution by hand: Examples
• h=[3,2,4] :
same as
[3, 0, 0] + [0, 2, 0] + [0, 0, 4]
same as
3[1] + 2[0, 1] + 4[0, 0, 1]
same as
3[1] + 2T1{[1]} + 4T2{[1]}
• x ∗ h = x ∗ 3[1] + x ∗ 2T1{[1]} + x ∗ 4T2{[1]}
• Example on board
4
The FFT in Practice
5
Fast Fourier Transform (FFT)
Review
• Given a signal, what is its frequency content?
• Helps us understand audio content (pitch, timbre,
melody, rhythm, genre, speech, …)
• Also a building block for designing and understanding
effects (filters, equalization, reverb, echo)
• One of the most powerful and useful
techniques for working with audio, image, and
video!
6
Fast Fourier Transform (FFT)
Review
• Equation:
• Essentially, dot-product multiply our signal x
with complex exponentials with periods of N, N/
2, N/3, … 2 samples (i.e., frequencies of 1/N, 2/N,
3/N, … 1/2 oscillations per sample), as well as DC
component
7
Fast Fourier Transform (FFT)
Review
• Each Xk is a complex number (e.g., 10+5i, or
3∠π/2)
• If the kth frequency is present in the signal, Xk
will have non-zero magnitude, and its
magnitude and phase will tell us how much
of that frequency is present and at what
8
phase (though not directly)
Viewing FFT output
1) Spectrum
9
Viewing FFT output
2) Spectrogram
10
What will you hear?
11
What will you hear?
12
What will you hear?
13
What is really output
• N-point FFT computes N complex values
• X0 to XN-1, representing frequencies of 0Hz
to (N-1/N * SampleRate)
0Hz, (1/N)*SR, (2/N)*SR, … (N/2)/N*SR, … (N-1)/N*SR
=1/2*SR (Nyquist)
• These frequencies often called “bins” of FFT
• Note that adjacent bins are (1/N)*SR apart
14
Bins above Nyquist are
redundant
Magnitude spectrum is symmetric around
the Nyquist frequency:
15
Bins above Nyquist are
redundant
Magnitude spectrum is symmetric around
the Nyquist frequency:
16
Bins above Nyquist are
redundant
Bin k is complex conjugate of bin N-k:
Complex conjugates
(equal in magnitude,
opposite in phase)
17
Bins above Nyquist are
redundant
Bin k is complex conjugate of bin N-k:
phases of these bins are flipped
18
Why???
If your input is a real-valued sinusoid, FFT
decomposes it into one phasor rotating
clockwise and one rotating
counterclockwise, at the same frequency.
+
19
Practical takeaway so far:
You only need to use bins 0 to N/2 for
analysis, assuming your input signal is real-
valued (and not complex-valued: always
true for audio)
There are specific, simple relationships
between magnitudes & phases of these first
N/2+1 bins and the rest of the bins.
20
Converting from bin
# to frequency in Hz
• N bins of FFT evenly divide frequencies
from 0 Hz to (N-1)/N * SR
• Why not up to sample rate itself?
• SR indistinguishable from 0Hz!
• We’re chopping frequencies from 0 up to
(but not including) the sample rate into N
21
bins, SO consecutive bins are (1/N)*SR apart
Width of
spectrum bins
Magnitude
f
Δf = f max / N = SampleRate / N
22
Example
I take an FFT of 128 samples; my sample
rate is 1000Hz.
N = 128; I have 128 “bins”.
Bin 0 is? (assuming indexing starting w/ 0)
0 Hz
Bin 1 is?
(1/128) * 1000 ≈ 7.8 Hz
Bin 2 is?
(2/128) * 1000 ≈ 15.6 Hz
23
Example
I take an FFT of 128 samples; my sample rate is
1000Hz.
N = 128; I have 128 “bins”.
14th bin is?
(14/128) * 1000 ≈ 109 Hz
Bins nearest to 300 Hz are?
(b/128) * 1000 = 300 ! b = 38.4
bins 38 and 39 are closest
Last bin I care about is?
Nyquist: (b/128)*1000 = 500 ! b = 64
24
(equivalently, equal to N/2)
What happens if my signal
contains a frequency that’s
not exactly equal to the
center frequency of a bin?
This frequency will “leak” into
nearby bins.
25
SR = 100Hz, sine at 24 Hz
26
SR = 100Hz, sine at 25 Hz
27
SR = 100Hz, sine at 24.5 Hz
28
How many bins to use?
(What should N be?)
More bins?
Better frequency resolution
Worse time resolution (FFT can’t
detect changes within the analysis frame)
Fewer bins?
Worse frequency resolution
Better time resolution
29
Time/Frequency
tradeoff
N=64 N=4096
30
31
32
What’s all that extra
stuff in the spectrum?
33
Not just clean peaks at frequencies and 0 elsewhere…
Reasons for this “stuff”
FFT treats your analysis frame as one period of an
infinite, periodic signal.
Signal doesn’t have an integer # of periods in
frame?
! Contains frequency components
other than 0, (1/N)*SR, (2/N)*SR, … SR/2.
34
Reasons for this “stuff”
FFT treats your analysis frame as one period of an
infinite, periodic signal.
“periodic” signal may have discontinuities
! only representable with high frequency content
35
Stay tuned for a way to help with this…
Practice:
Pitch tracking
Q: How many bins should we use?
Q: Algorithm to determine pitch?
36
Example R code
saw <- readWave("sawtooth.wav")
X <- fft(saw@left[1:2048]) #saw@left gives
# us left channel samples
plot(abs(X)[1:1025], type="h")
maxbin <- which.max(abs(X)[1:1025])
maxfreq <- (maxbin-1)/2048*44100
#assuming 44100 SR
37
How to deal with
music that changes
over time?
Compute FFT at many
points in time.
38
“Short-time Fourier
Transform” (STFT)
N-point FFT N-point FFT N-point FFT
39
STFT hop size
# of samples between beginning of one frame
and the next
N-point FFT N-point FFT
Equivalently talk about “overlap” between adjacent frames.
Adjust based on application needs.
40
Example applications
of STFT?
Pitch tracking over time (melody extraction)
Onset detection (for rhythm/tempo analysis?)
Audio fingerprinting
More discussion on these in a few weeks
41
Practical FFT Questions
• N = ? (Frame length)
• Balances time & frequency resolutions
• FFT or STFT?
• Is frequency content changing over time?
• If STFT, choose hop size based on granularity of
analysis needed
• Do I care about magnitude, phase, or both?
• Magnitude alone useful for basic timbre analysis,
instrument identification, many other things;
phase required for reconstruction of waveform
***Plus a few other things: revisiting this at end of
lecture***
42
Converting from FFT
back into sound
Option 1: Take magnitude and phase of each bin
(including second half of bins), compute a
sinusoid at appropriate magnitude, frequency,
and phase…
Option 2 (MUCH BETTER): Use inverse FFT (i.e.,
the IFFT)
43
The Inverse Discrete
Fourier Transform (IDFT)
N
X1
1 i2⇡kn/N
xn = Xk e
N
k=0
Compare to DFT: N
X1
i2⇡kn/N
Xk = xn e
n=0
IDFT is just like DFT, but 1) has 1/N factor and
positive exponent; 2) converts from complex
into real (assuming original signal was real-
44
valued)
The IFFT in practice
Compute IDFT using the IFFT
N FFT bins ! N IFFT samples
In R, with signal library:
x <- abs(ifft(X))
(abs enforces reasonable assumption of real
45
valued elements of x)
A possible application of
IFFT?
Modify a sound by manipulating its spectrum:
FFT
Original signal
Multiply 4th
bin by 0.25
IFFT
46
Modified signal
A possible application of
IFFT?
Modify a sound by manipulating its spectrum:
There are better ways
FFT
of doing this…
Original signal
Multiply 4th
bin by 0.25
IFFT
47
Modified signal
Why so many versions of Fourier analysis?
Continuous Time Discrete Time
Aperiodic / Fourier Transform Discrete-time
unbounded time, Fourier Transform
continuous (DTFT)
frequency
Periodic or Fourier Series Discrete Fourier
bounded time, Transform (DFT)
discrete (FFT used here)
frequency
• Each of these also has an inverse.
• You’ll mainly care about the FFT (the fast
48
algorithm for computing the DFT).
How to build useful
systems?
Method 1) Design a useful impulse
response.
49
A very simple system
[1] = [1, 0, 0, …]
H h[n] = [2]
Impulse in h[n] = [0.5]
y[n] = x[n] ∗ h[n]
Volume control!
50
Another very simple system
h[n] = [1, 0, 0, 0.5]
[1] = [1, 0, 0, …]
H
Impulse in
y[n] = x[n] ∗ h[n]
[very simple] echo
51
More realistic echo
Use this as h[n]
52
Convolution reverb
Record impulse response for concert halls,
churches, etc. Use this as h[n].
53
Example impulse responses…
A simple smoothing system
Take average of nearby points:
54
A simple smoothing system
[1] = [1, 0, 0, …] H h[n] = [0.5, 0.5]
y[n] = .5x[n-1] + .5x[n]
55
How to improve this?
Can use h=[0.25, 0.25, 0.25, 0.25],
h=[0.1, 0.1, … 0.1] to make signal even smoother
But there’s a better way...
“smoother” = “less high-frequency content”
56
How to build useful systems?
Method 1) Design a useful impulse response
We have to know how we want the time-
domain sound signal to be changed by the
system.
Method 2) Design a useful frequency response
Instead, we can decide how we want the
spectrum of the sound to be changed by the
system.
57
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Doesn’t change magnitude
Relative change in
spectrum
1.0
magnitude
58
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Removes higher frequencies, leaves
Relative change in
lower freqs unchanged
1.0
magnitude
59
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Removes lower frequencies, leaves
Relative change in
higher freqs unchanged
1.0
magnitude
60
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Allows only a range of frequencies
Relative change in
to pass through system
1.0
magnitude
61
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Allows all but a range of
Relative change in
frequencies to pass through system
1.0
magnitude
62
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Allows all but a range of
Relative change in
frequencies to pass through system
1.0
magnitude
63
Frequency
Filters
Each of these systems is an
example of a common type of
audio filter.
64
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Doesn’t change magnitude
spectrum
Relative change in
1.0 All-pass filter
magnitude
65
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Removes higher frequencies, leaves
Relative change in
lower freqs unchanged
1.0 Low-pass filter
magnitude
66
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Removes lower frequencies, leaves
higher freqs unchanged
Relative change in
1.0 high-pass filter
magnitude
67
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Allows only a range of frequencies
to pass through system
Relative change in
1.0 band-pass filter
magnitude
68
Frequency
Frequency response
Any LTI system has the ability to change the
spectrum of a sound
Allows all but a range of
frequencies to pass through system
Band-stop filter
Relative change in
1.0
magnitude
69
Frequency
The frequency response
The effect of a system on a signal can be
understood as multiplying the signal’s spectrum
by the frequency response.
nth bin in input x nth bin in frequency response
= nth bin in output
70
Relationship of frequency
response & impulse response
If h[n] is a system’s impulse response
then the spectrum of h[n] (FFT(h[n]))
is the frequency response!
Impulse response
h[n] = [1, 0, 0, 0.5]
[1] = [1, 0, 0, …]
Impulse in H
FFT(h[n]) is
frequency
response
71
Consequences
1) Can take the FFT of h[n] to
understand what an arbitrary
system with known h[n] will do to a
spectrum
72
Point-wise multiplication in spectral
domain = convolution in time
domain:
a[n] ∗ b[n] "! Ak× Bk
Point-wise multiplication in time
domain = convolution in spectral
domain:
a[n] × b[n] "! Ak∗ Bk
73
Convolution & Multiplication
Convolving in the time-domain (x[n] ∗
h[n]) is equivalent to multiplication in the
frequency domain (Xk ·∙ Hk).
∗ =
FFT
FFT
FFT
=
×
74
Convolution & Multiplication
Convolving in the time-domain (x[n] ∗
h[n]) is equivalent to multiplication in the
frequency domain (Xk ·∙ Hk).
∗ =
IFFT
IFFT
IFFT
75
Very important principles
• Convolving in the time-domain (x[n]
h[n]) is equivalent to multiplication in
the frequency domain!
• Also, multiplying in the time domain
is equivalent to convolving in the
frequency domain.
76
One big problem…
Filters like this are undesirable.
Relative change in
1.0
magnitude
77
Frequency
More practical
FFT advice
78
Windowing: Motivation
A problem:
79
Windowing
“Selecting” N time-domain samples is like
point-by-point multiplication with a
rectangular function (“window”):
80
Windowing
A rectangular signal has a very “messy”
spectrum!
Signal:
Spectrum:
81
Windowing
Multiplying a signal by a rectangle in
time…
Is equivalent to convolving their spectra!
∗
82
Solution: Apply a smoother
window
Before taking FFT, multiply the signal with a smooth
window with a “nicer” spectrum
(Equivalently, something that will get rid of sharp edges
at either end of analysis frame)
83
Windowing process
point-wise multiply with
window:
Result
(apply FFT to this)
84
Example windows
From http://en.wikipedia.org/wiki/Window_function
85
Example windows
From http://en.wikipedia.org/wiki/Window_function
86