Code Excited Linear Prediction (CELP)
Aneek Anwar Zaeem Varaich Bilal Hassan Hashim Bhatti 2012-MS-EE-067 2012-MS-EE-078 2012-MS-EE-075 2008-MS-EE-116
Introduction
CELP is a speech coding algorithm proposed by Schroeder and Atal One of the most widely used speech coding algorithm for lossy compression of speech Based on the idea of Linear Prediction (LPC) Used as a generic term for variety of codecs like
MPEG-4 Part 3 (CELP as an MPEG-4 Audio Object Type) G.728 - Coding of speech at 16 kbit/s using low-delay code excited linear prediction G.718 - uses CELP for the lower two layers for the band (506400 Hz) in a two stage coding structure G.729.1 - uses CELP coding for the lower band (504000 Hz) in a three-stage coding structure
Background on Speech Signal
Speech signal is a short-time periodic signal
Short-time overlapping frames through windowing the signal All the subsequent processing is done on these frames
Source-Filter model of Speech
Assumes a source of sound and a filter that shapes that sound, organized so that the source and the filter are independent Source is our vocal cords in larynx and filter is our vocal tract Our glottis vibrates with a fundamental frequency f0 and the source contains f0 and its harmonics Our vocal tract resonates at certain frequencies called formants for different vowels, so we have peaks at these formants
Linear Prediction Coefficients (LPCs)
Based on the source-filter model of speech Source can be modelled by an impulse train with frequency f0 in case of voiced speech and by white noise in case of unvoiced speech The filter can be modelled as an all-pole filter with poles at the formants freq. So H(z) =
=
LPCs contd.
We can predict the next sample using linear combination of previous p samples, thus called linear prediction
s(n) = a1s(n-1) + a2s(n-2) aps(n-p)
or
The error between the original and predicted sample will be
LPCs contd.
Taking the z-transform, we get
So S(z) will be given as S(z) = E(z) / A(z) = E(z) H(z) where H(z) = 1/A(z) So we just need to find the coefficients ak to model the filter ak can be computed by using least-square criterion or Levinson Durbin algorithm
Long Term Prediction (LTP)
The idea is to predict one period of signal from the preceding one
(n) b x (n M ). x
Two unknowns, b and M
M is the pitch period and can be estimated using any pitch estimation technique b is the unknown coefficient and estimated using the least square criterion
Vector Quantization
Vector quantization (VQ) allows the modeling of probability density functions by the distribution of prototype vectors. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. Since data points are represented by the index of their closest centroid, commonly occurring data have low error, and rare data high error. VQ is quite suitable for lossy data compression.
Vector Quantization: Definition
Blocks: form vectors A sequence of audio A block of image pixels
x x0
x1 xN 1
k into a finite A vector quantizer maps k-dimensional vectors in the vector space R T set of vectors x x x x 0 1 N 1 T Unquantized vector: y y0 y1 yN 1 Quantized vector: y VQx ri , x Ci Reconstruction vector (codeword): r i Codebook: the set of all the codewords: Voronoi region: nearest neighbor region
10
Vector Quantizer: 2-D example
11
Vector Quantization Procedure
12
CELP
CELP is based on the previously discussed concepts
An LPC filter is used to model the vocal tract along with the Long Term Prediction Error signal which acts as excitation signal in source filter model is quantized using VQ - both fixed and adaptive codebooks are used A weighted perceptual filter is added to reduce the noise
CELP algorithm
Encoding
LPC analysis H(z) Define perceptual weighting filter. This permits more noise at formant frequencies where it will be masked by the speech Synthesize speech using each codebook entry in turn as the input to V(z)
Calculate optimum gain to minimize perceptually weighted error energy in speech frame Select codebook entry that gives lowest error
Transmit LPC parameters and codebook index
Decoding
Receive LPC parameters and codebook index Resynthesize speech using H(z) and codebook entry
CELP Basic Encoder
Synthetic Original Speech signal speech signal x
Normalized Stochastic Codebook
1/B(z) g
1/A(z)
LS criterion
where
1/B(z) represents a Long Term Prediction filter 1/A(z) is the LPC filter g is the gain
CELP Adding a perceptual filter
We want to choose the LTP delay and codebook entry that gives the best sounding resynthesized speech. We exploit the phenomenon of masking: a listener will not notice noise at the formant frequencies because it will be overwhelmed by the speech energy. We therefore filter the error signal by:
W(z) = H(z/0.8) / H(z) = A(z) / A(z/0.8)
Synthetic Original Speech signal speech signal x
Normalized Stochastic Codebook
1/B(z) gain
LS criterion
1/A(z)
W(z)
Error q
Block Diagram of complete encoder
10 9 8 7 6
1/A(f) W(f)
Original Signal Pitch estimation ana-synt
0 1 2
5 4 3 2 1
LPC Analysis
0 0
500 1000 1500 2000 2500 3000 3500
s
1/B(z) 1/A(z) Gain g
perceptual Criterion W(z) MC
Synthetic speech
M1
Waveforms codebook
Search for the best code and gain Iteration on the whole codebook
Fixed and Adaptive Codebook
Stochastic or fixed codebook normally contains 1082 independent random values from the set {1, 0, +1} with probabilities {0.1, 0. 8, 0.1}. Adaptive Codebook is formed from Long Term Prediction (LTP) filter
CELP Encoder with Adaptive Codebook
Speech frame
+
W(z)
p 0
H filter memory
Adaptive codebook
g1
H(z)
p 1 +
c 1,i(0)
p e
Least Square
Stochastic codebook
g2
H(z)
p 2
c 2,i(1)
Transmitted Parameters
Adaptive Codebook Gain and Index Fixed Codebook Gain and Index LPC filter coefficients
CELP Decoder
Decode received parameters: Index of stochastic codebook Gain of stochastic codebook Index of adaptive codebook Gain of adaptive codebook Linear prediction filter coefficients Adaptive codebook
g1 c 1,i(0)
1/A(z)
Synthetic speech
Stochastic codebook
g2 c 2,i(1)
Various Standards for Speech coding
UIT Standard G711 G721 G723 G726 G727 G728 G729 G729a G723.1 Method PCM ADPCM Year 1972 1984 1986 1988 1990 1992 1994 1996 1995 Bit rate in Delay in ms Kbps 64 0.125 32 40-32-24 40-32-24-16 40-32-24-16 16 8 6.3 5.3 0,125 Quality MOS 4.3 4.1 at 32Kbps Complexity in Mips <<1 1.25
LD-CELP CS-ACELP MP-MLQ ACELP
2.5 30 75
4.0 3.9 3.9
30 25 12 24
Compression Ratio
For normal PCM speech, we use 8bits per sample at the sampling rate of 8kHz data rate = 64kbps For various CELP standards, data rate can be as low as 6kbps for the same signal So
COMPRESSION RATIO = 64/6 = 10
References
B.S. Atal, "The History of Linear Prediction," IEEE Signal Processing Magazine, vol. 23, no. 2, March 2006, pp. 154161. M. R. Schroeder and B. S. Atal, "Code-excited linear prediction (CELP): high-quality speech at very low bit rates," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 10, pp. 937940, 1985. Digital Processing of Speech Signals. L. R. Rabiner and R. W. Schafer. Prentice-Hall (Signal Processing Series), 1978.