By: Mehrnaz Monajati
Instructor: Dr. S.M. Fakhrai
This is a class presentation. All data are copy rights of
their respective authors as listed in the references and
have been used here for educational purpose only.
Fixed vs. Floating Point
DSPs
Cost
Ease of use
Accuracy
Dynamic range
Fixed vs. Floating Point
DSPs
Cost
Today, fixed-point DSPs continue to benefit more
from cost reductions of scale in manufacturing
since they are more often used for high-volume
applications
the same reductions will apply to floating-point
DSPs when high-volume demand for the devices
appears.
Today, cost has increasingly become an issue of
SOC integration and volume, rather than a result
of the size of the DSP core itself.
3
Fixed vs. Floating Point
DSPs
Ease of use
Last days
Today
TI floating-point supported the C
language
FXP DSPs were programmed at the
assembly code level
Coding of real arithmetic in to hardware
TI fixed-point DSPs have long been
Directly in FLP
indirectly in FXP
software routines that added
development time and extra
supported by outstandingly efficient
C compilers
The advantage of implementing
real arithmetic directly in floatingpoint hardware still remains
Reduction in FXP complexity
FXP DSPs still have an edge in cost
and FLP DSPs in ease of use, but the
edge has narrowed
instructions to the algorithm
Programming
Fixed vs. Floating Point
DSPs
Accuracy
Dynamic range
Accuracy of FLP is greater than FXP
FLP has greater precision in integer as well as real
values
Exponentiation vastly increases the dynamic
range
Internal data representations in FLP DSPs are more
exact than in FXP
ensuring greater accuracy in end result
5
Fixed vs. Floating Point
DSPs
FXP DSPs
TIs TMS320C62x FXP DSPs
Two data paths operating in parallel
Each with a 16-bit word width
provides signed integer values within a range from 2^15 to
2^15
TMS320C64x DSPs,
double the overall throughput with four 16-bit multipliers
TMS320C5x and TMS320C2x DSPs
designed for handheld and control applications, respectively
are based on single 16-bit data paths
6
Fixed vs. Floating Point
DSPs
FLP DSPs
TMS320C67x FLP DSPs
divide a 32-bit data path into two parts: a 24-bit mantissa and
an 8-bit exponent.
16M range of precision
supporting a vastly greater dynamic range than is available
with the FXP format. The C67x DSP can also perform
calculations
C67x DSP
Using industry-standard double-width precision
64 bits, including a 53-bit mantissa and an 11-bit exponent
Achieves much greater precision and dynamic range at the
expense of speed, since it requires multiple cycles for each
operation
7
Standards for FLP
Number Formats
FLP Nnumber Formats
Sample Floating Point
DSPs
AMD - Athlon Processor
Xilinx Virtex-5 APU Floating Point Unit
Digital Core Design DFPAU ver 2.05
10
AMD - Athlon Processor
2000
Include the most powerful floating point engine for
x86 platforms
Delivers twice the peak x87 floating point
execution rate of the Intel Pentium III processor
Rivals the FP performance of many RISC
processors in that time
Superscalar and Super pipelined
Higher clock frequencies
Higher overall throughput
Ref. [3]
11
AMD - Athlon Processor
2000
Ref. [3]
12
Xilinx Virtex-5 APU FLP
Unit
2009
designed for the PowerPC 440 embedded microprocessor of
the Virtex-5 FXT FPGA family
support for IEEE-754 standard in single or double precision
Optimized for 2:1 and 3:1 APU:CPU clock ratios
allowing PowerPC processor to operate at maximum frequency
Application:
Digital signal processing of high-quality audio or video signals
where a very large dynamic range is needed to retain fidelity.
Matrix inversion in wireless communications and radar
Digital signal processing tasks, spectral methods such as FFT
Statistical processing
where floating-point is often the simplest way to avoid integer
overflow and rounding errors
13
Xilinx Virtex-5 APU FLP
Unit
2009
Increased Processing Capacity
Hardware floating-point operations complete faster than the equivalent
software emulation routines
The floating-point operators within the FPU are pipelined
multiple floating-point calculations can proceed in parallel
The FPU is autonomous
the PowerPC processor internal pipeline can continue to execute integer instructions
while floating-point operations are handled by the FPU in parallel
IEEE 754-1985 / Book-E Standard Compatibility
The standard represents very small numbers by allowing significands of
the form "0.x" in addition to the usual 1.x used by normalized FLP
numbers
In Book-E, the multiply part of a multiply-add operation should not round
its result before supplying it to the addition part
The FPU treats all not-a-number (NaN) values as quiet NaNs, which do
not cause exceptions. When a floating-point operation results in a NaN
because one of the inputs was a NaN, the input NaN is not propagated
to the output; the default quiet NaN value is provided. This value is
0x7ff8000000000000 in double precision, and 0x7f800000 in single
precision
14
Xilinx Virtex-5 APU FLP
Unit
Ref. [4]
15
Digital Core Design DFPAU ver.
2.05, 2010
It is a FLP Arithmetic Co-processor
directly replaces C software functions, by
equivalent, very fast hardware operations
significantly accelerate system performance
It doesnt require any programming
Everything is done automatically during software
compilation by the DFPAU C driver.
Supports addition, subtraction, multiplication,
division, square root, comparison, absolute value
The input numbers format is according to IEEE-754
Each floating point function can be turned on/off
at configuration level
providing the flexible scalability of DFPAU module
technology independent design
16
Digital Core Design DFPAU ver.
2.05, 2010
Ref. [5]
Ref. [5]
17
Architectural Modification to
Improve FLP Unit in FPGAs
Variable
length shifters account for over 30%
2008
[1]
of a adder and 25% of a multiplier
embedded
Coarse-grained approach
shifter
Embedded Shifter
Consumed
fine-grained approach
area
Multiplexer
Saved area
Increased
4:1
multiple
chip 1.5%
xer
0.48%
14.6%
clock 3.3%
7.3%
11.6%
rate
18
Low power FLP Unit
2009
Design of[2]
embedded systems applications with
low power consumption and fast processing
performing basic operations such as addition,
subtraction, multiplication and division
Idea:
the functional units (adder, shifter, registers) are
shared between different operations
Advantage: saving silicon area
Disadvantage: the increase in the number of
cycles required to perform the operation
19
Low power FLP Unit 2009
Ref. [2]
20
Low power FLP Unit 2009
Ref. [2]
21
Reconfigurable FLP Unit
2009
[7] applications usually have very
Non-numerical
few FLP operations
FLP unit is always under idle mode
In idle mode, the floating-point unit still
consume power and the die area is wasted
Idea:
reconfigurable floating-point unit that provide
integer and floating-point operations
22
Reconfigurable FLP Unit
rAMM
Array
Ref. [7]
23
Reconfigurable FLP Unit
Ref. [7]
24
Reconfigurable FLP Unit
Ref. [7]
Ref. [7]
25
References
1.
2.
3.
4.
5.
6.
7.
M. Beauchamp, et al., "Architectural modifications to enhance the
floating-point performance of FPGAs," IEEE Transactions on Very
Large Scale Integration Systems, vol. 16, p. 177, 2008.
R.Neves, et al. "A Floating Point Unit Architecture for Low Power
Embedded Systems Applications," XXIV SIM - South Symposium
on Microelectronics, 2009.
AMD Athlon Floating Point Engine, "AMD Athlon Processor floating
Point Capability, The Most Powerful, Architecturally Advanced
Floating Point Engine Ever Delivered in an x86 Microprocessor,"
with paper, 2000.
Xilinx DS693 Virtex-5 APU Floating-Point Unit v1.01a, Data Sheet,
DS693, 2009.
DFPAU floating-point pipelined divider, 2010,
<http://www.altera.com>.
G. Frantz and R. Simar, "Comparing Fixed and Floating Point
DSPs," SPRY061, Texas Instruments, 2004.
Y. Lee and J. Jou, "Design of A Reconfigurable Floating-Point Unit,"
2009.
26
27
Embedded shifter block
diagram
Ref. [1]
28
4:1 Multiplexer
Ref. [1]
29