Lect: 3 Date : 6- 8 - 11 Topics: Numeric Representation used in DSP Fixed point Floating point
Endians:
Big Endian(MSB in first location) Little endian How 12345678 will be stored in four location starting from 4000 in each case? TI DSP: Little endian Motorola DSP: Big endian
Numeric Representation used in DSP
FixedFixed-point
16-bit 1620-bit 2024-bit 24-
FloatingFloating-point
32-bit 3264-bit 6480-bit 80128-bit 128-
Fixed-Point Notation
A 16-bit fixed-point number can be interpreted as either: Integer (i.e., 20645) Fractional number (i.e., 0.75)
Integer:
Unsigned integer (from 0 to 216 i.e. 65,536) Signed integer (from 32,768 to 32,767) N-bit fixed point, 2s complement integer representation X = -bN-1 2N-1 + bN-2 2N-2 + + b020
What will be the value of 1 0 10 1 1 0 0 ?
Some parameters to define representation accuracy Precision Smallest step (difference) between two consecutive N-bit numbers. Dynamic Range Ratio between the largest number and the smallest (positive) number. It can be expressed in dB (decibels) as follows: Dynamic Range (dB) =
20 log10 ( Max / Min)
Quantization error is the numeric error introduced when a longer numeric format is converted to a shorter one, e.g., when we round 1.325 to 1.33, we introduced a quantization error of 0.005.
In integer representation the precision is how much? In DSP much more precision is needed. So fractional number representation is used.
Fractional Fixed-Point Representation Called as Q-format (Quantity of fractional bits)
General Fractional Fixed-Point Representation
Q m.n notation m bits for integer portion n bits for fractional portion Total number of bits N = m + n + 1, for signed numbers Example: 16-bit number (N=16) and Q2.13 format 2 bits for integer portion 13 bits for fractional portion 1 signed bit (MSB) Special cases: 16-bit integer number (N=16) => Q15.0 format 16-bit fractional number (N = 16) => Q0.15 format; also known as Q.15 or Q15
S Integer (15 bits)
Q15.0
S Fraction (15 bits)
. .
Q.15 or Q15 Used in DSP
Binary pt position .
Q1.14
Upper 2 bits
Remaining 14 bits
Q15 used in 16-bit DSP chip, resolution of the fraction will be 215 or 30.518e6 Q15 means scaling by 1/215 Q15 means shifting to the right by 15 Example: how to represent 0.2625 in memory: Method 1 (Truncation): INT[0.2625*215]= INT[8601.6] = 8601 = 0010000110011001 Method 2 (Rounding): INT[0.2625*215+0.5]= INT[8602.1] = 8602 = 0010000110011010 By this method of rounding or truncation we introduce quantization error
Represent 0.95624 in Q4 format. Use truncation and rounding.
Find the quantization error. Ans: 0.01874
Error in fixed point multiplication:
Overflow in Fixed point addition: