Texa
Texa
T
toDSP
he first commercially available IC digital-signal proces-
sor—Intel’s 2920—was replacing analogue filter banks in full-
duplex, 1200-bps, digital hardware modems as early as 1979. At the
same time, rapidly growing numbers of microprocessors and pe-
ripherals increased the feasibility of handling signals in numerical
representation. Until about that time, just about any commercial
Illustration by Dan Guidera
implement digital filters. But before get- Mathworks also model DSP algorithms
AT A GLANCE
ting involved in a DSP-versus-microcon- and automatically generate code that you
troller debate, you may ask, why use dig- 컄 DSP mandates hard-real-time operation. can port to a variety of hardware targets.
ital filters? What else are DSPs good for? But digital-signal processing’s abilities re-
The classic justification for digital filter- 컄 You can’t avoid math, but you can keep ally start with filter applications. For ex-
ing is that you can implement a linear- it simple. ample, software tools also painlessly im-
phase, FIR (finite-impulse-response) fil- plement FFTs (fast Fourier transforms).
ter that preserves signal fidelity in 컄 Starter kits provide low-cost develop- You can then perform a frequency analy-
applications such as audio processing. ment experience. sis on a snapshot of a continuous-time
Avoiding signal distortions due to the signal. This ability can help isolate a char-
unequal group delays that result from 컄 Software tools automate algorithm de- acteristic frequency from a sea of noise,
nonlinear phase-versus-frequency re- velopment. such as detecting detonation in a car’s
sponse characteristics can also be essen- cylinder head by screening microphonic
tial when you’re trying to process sensor 컄 Profilers help meet real-time deadlines. noise from a piezoresistive knock sensor.
signals. As anyone who has tried knows, Elsewhere, you might use an FFT to scan
building a linear-phase filter in analogue the characteristic frequency emissions of
technology is next to impossible; by com- If you use simulation tools for control- a rotating machine—to detect, say, the
parison, a DSP and software-filter tool- system modelling, you know that tools onset of a bearing failure in a helicopter’s
box makes such realisations trivial. such as Matlab and Simulink from The power train. DSPs also come into their
bler code within CodeWarrior’s editor. memory locations from here. If you now and off-chip memory use, be sure to test
If you use the prebuilt stationery, the run the program, the console window the true target configuration.
project window opens to reveal code, greets you and reorders an array of num- CodeWarrior installs just one F805 ex-
support, and library subdirectories; ex- bers into successively higher values. The ample that detects interrupts from on-
panding any of these accesses the appro- kill icon stops execution, ready for an- board switches to blink alternate LEDs.
priate resource, which you can then edit. other compile-and-link session. You You can use this example as guidance for
Select the “run” icon within the project could, for example, try the simulator, using C-compiler pragma directives,
window, and the default settings compile, which can report the number of machine writing interrupt-service routines, and
link, and download the template pro- cycles and instructions within a proce- saving and calling library functions. You
gram to the development board’s exter- dure. But you won’t find a formal profil- may now want to explore the SDK rou-
nal SRAM. The thread window opens, ing tool, leaving you to calculate time- tines, but if you want CAN (controller-
giving you a view of the stack, variables, critical tasks. One alternative is to area-network) examples, you’re out of
and source code, together with execution instrument your code at critical points by luck—this capability is a $3000 option
controls, such as set/clear breakpoint, including calls to start and stop on-chip within the extra-cost premium package.
run, kill, and various step options. timers; removing the call overhead yields Other premium capabilities include se-
Examine the available tools under the the true result. Of course, if your code curity routines, including RSA (Rivest,
View menu, such as expression, global- runs without missing deadlines with the Shamir, and Adleman), DES (data-en-
variable, and register views. These facili- instrumentation in place, removing the cryption-standard), and triple-DES al-
ties include raw-data, disassembly, overhead shouldn’t negatively impact gorithms, as well as telecom and speech-
source, and mixed-mode memory repre- operation. Because significant perform- processing functions, such as the G711
sentations; you can also directly edit ance differences can exist between on- voice coder. The SDK’s CD-ROM in-
CORE/SYSTEM-BUS INTERFACE
PARALLEL-PERIPERAL
INTERFACE/GPIO
DMA
CONTROLLER
TWO SERIAL PORTS
SPI PORT
BOOT ROM
Figure 2
Although marketing suggests multimedia applications, comprehensive I/O equally suits Analog Devices’ Blackfin family to real-time acquisition and
control tasks.
cludes the driver user’s manual for these Other development tools include the allel-cable emulation, the SDK, and a
options, so you can see what’s available. $200 parallel-to-JTAG command con- free version of CodeWarrior that’s limit-
But you can access a range of free mo- verter; a USB-to-JTAG converter is cur- ed to 16 kbytes of program memory; the
tor and motion-control examples, a va- rently in development. You can also unlimited IDE costs $495.
riety of modem and telephony applica- specify the $1999 PCI-to-JTAG emula-
tions, and a plethora of tools and tor connection and its $2999 Ethernet MULTIMEDIA LIGHTS UP REAL-TIME CONTROL
general-purpose routines. Notice that equivalent. At the other end of the spec- Primarily targeting multimedia and
the SDK also demonstrates Motorola’s trum—and possibly a better option for portable equipment, Analog Devices’
specification for portable algorithm de- curious users who don’t need the F805’s Blackfin family comprises three code-
velopment, allowing third-party vendor I/O power—is the $65 DSP56F801 de- compatible processors that differ only in
support. velopment kit. This kit also includes par- speed and on-chip memory comple-
VITERBI
COPROCESSOR
DIRECT-MAPPED
TURBO-CODE 16-kBYTE-TOTAL
COPROCESSOR LEVEL-ONE PROGRAM
64 ENHANCED MEMORY
INTERFACE A
C64x DSP CORE
16 ENHANCED MEMORY INSTRUCTION FETCH
INTERFACE B CONTROL
INSTRUCTION DISPATCH REGISTERS
ADVANCED INSTRUCTION PACKET
TIMER 2 CONTROL
INSTRUCTION DECODE
LOGIC
TIMER 1 DATAPATH A DATAPATH B
A REGISTER FILE B REGISTER FILE TEXT
TIMER 0 A31-A16 B31-B16
64-CHANNEL A15-A0 B15-B0 ADVANCED
ENHANCED IN-CIRCUIT
MULTICHANNEL 1024-kBYTE
DMA EMULATION
BUFFERED LEVEL TWO
CONTROLLER
SERIAL PORT 2 MEMORY
INTERRUPT
.L1 .S1 .M1 .D1 .D2 .M2 .S2 .L2
UTOPIA CONTROL
OR
MULTICHANNEL
BUFFERED
SERIAL PORT 1
MULTICHANNEL 16-kBYTE-TOTAL
BUFFERED TWO-WAY SET-ASSOCIATIVE
SERIAL PORT 0 LEVEL ONE DATA CACHE
16
GPIO [8:0]
GPIO [15:9]
32 HOST-PORT
INTERFACE
OR
Figure 3
The 320C6416 from Texas Instruments includes dedicated communications coprocessors that accelerate applications such as wireless base stations.
ment. But these chips equally suit real- that packs 148 kbytes of memory into its “invalid-licence” errors.) Notice that
time-control applications with peripher- 12-mm-sq, 160-pin, 0.8-mm-pitch BGA you’ll need Windows 98/2000/XP; the
als that include a UART, SPI and serial package. Guide prices range from $7 software doesn’t support NT, because NT
ports; timers with PWM and pulse- (1000) for the 300-MHz chip to $23.50 doesn’t natively support USB. Famil-
measurement capabilities; a real-time (1000) for the 600-MHz version. iarise yourself with the CD-ROM’s con-
clock and watchdog timer; and a flexible, Available now for $295, the ADDS- tents, and then try the tutorial from the
GPIO structure. Other system units in- BF533-EZlite starter kit includes a wealth of potential paths that follow.
clude DMA and interrupt controllers, as 127⫻178-mm pc board that carries the The first two exercises introduce how
well as hooks to control external flash processor, 2 Mbytes of flash, and 32 a basic component, such as the linker, in-
memory and SDRAM (Figure 2). Video Mbytes of SDRAM. Onboard peripher- teracts by having you build, run, and
ports that support the ITU-R-656 format als include an AD1836 audio codec with modify a procedure that calculates each
also simplify industrial machine-vision four input and six output channels; an sum of three products from fixed data ar-
applications, as well as consumer/profes- ADV7171 video encoder and ADV7183 rays. Interestingly, the “mixed”-control
sional-video products that employ the decoder, each with three phono jacks; an view in the C-source window allows you
525/625-line component-video format. ADM3202 RS-232 line driver/receiver for to simultaneously compare C and as-
Internally, the Blackfin core compris- the on-chip UART that’s wired to a DB9 sembler implementations of the rou-
es a dual-MAC fixed-point processing connector; an expansion interface that tines. This facility is an alternative ap-
engine that stores results in two 40-bit ac- carries I/O, such as the parallel interfaces, proach to some IDEs’ ability to drag a
cumulators. In use, the VisualDSP⫹⫹ serial interfaces, and SPIs; a JTAG in-cir- source line into an assembler widow or
IDE’s compiler automatically synthesis- cuit-emulator header; and various LEDs vice versa to locate the corresponding
es floating-point operations in software and pushbuttons. The package relies on code fragment. Other editor features al-
using a default 32-bit-“long” value that a 20-kbyte, code-limited version of the low setting and clearing of breakpoints,
best suits the core’s architecture; you can VisualDSP⫹⫹ IDE for program devel- enabling line numbering, and setting
optionally change this behaviour to opment. Debugging communications bookmarks. If you load the code exam-
comply with ANSII-C specifications. The employ the onboard USB-to-JTAG de- ple in lesson three into the previous ses-
core supports operations on signed or bugging interface, which permits nonin- sion, you can see various facets of the
unsigned integer data via two 40-bit trusive communications with a host PC IDE’s data-visualisation capabilities, such
ALUs and a 40-bit shifter that perform via the environment’s background- as normal plots, their FFT equivalents,
traditional 16- and 32-bit arithmetic and telemetry-channel. Documentation in- and a FIR-filter-response analysis. Be-
logical operations. This 40-bit length cludes the evaluation-system manual, cause this and other current Blackfin ex-
supports rounding and saturation fol- and a CD-ROM includes device data amples target the 535 processor, change
lowing repeated MAC cycles. There are sheets and references, such as the pro- the target to 533, or you’ll encounter
also four video ALUs that accelerate pro- gramming manual; optionally, you can load-error failure messages. Giuseppe
cessing via functions such as byte-align- order printed versions from the vendor’s Olivadoti of the Blackfin tools marketing
ment and packing operations, 16- and 8- Web site. team says that Analog is porting all 535
bit additions with clipping, and 8-bit Ignore the quick-start card; follow the examples to suit the 533.
averaging and subtract/absolute-value/ instructions in the evaluation-system Because the kit licence doesn’t permit
accumulate instructions. The ADSP- manual, and the VisualDSP⫹⫹ IDE in- simulator sessions, download the full
BF533 chip that ships with the starter kit stallation proceeds faultlessly. (You must software and its evaluation licence, to-
is a top-of-the-range, 600-MHz version have the starter kit connected to avoid gether with the licence-manager utility.
swiTxJoin done
swiControl unknown
LOG_message
Other Threads
error
PRD ticks
break
Time
Assertions
Figure 4
Embedded instrumentation allows the Code Composer IDE to graph execution statistics in real time.
Running this 60-Mbyte combination up- allows you to add conformant modules chip packs an ATM (asynchronous-trans-
grades your installation; after 90 days, it from Analog Devices and third-party fer-mode) interface into its 532-pin, 0.8-
reverts to the original restrictive version. vendors to your own projects without mm-pitch BGA outline, which measures
After selecting the single-simulator ses- fear of contentions, such as memory or just 23 mm sq; other communications pe-
sion option, you can now see the linear- I/O clashes. ripherals include Viterbi and turbo-de-
profiling simulation tool that analyses However, the performance monitor coder coprocessors. General-purpose pe-
where the processor spends time within that the kit’s documentation describes ripherals comprise one 16-bit and one
an application. Double-clicking on a rou- won’t appear in the menu system until 64-bit external-memory-interface port; a
tine shows the underlying assembler, to- you upgrade to the full version.You’ll also 64-channel DMA controller; three multi-
gether with the percentage of time spent notice that the environment is sluggish, channel, buffered serial ports; three 32-bit
within the flow; double-clicking on a due to the background-telemetry-chan- timers; a port that’s selectable between
program-counter entry highlights the nel agent that gathers information as the PCI (peripheral-component-intercon-
corresponding disassembler output. Oli- processor runs. Permanently overcoming nect) and HPI (host-processor-interface)
vadoti explains that the profiler allows this and other restrictions will cost you operation; and 16 bits of general-purpose
you to run the whole application in the around $4000 for either the USB or the I/O (Figure 3). With this amount of po-
profiling environment, rather than just PCI-bus emulators, plus $3500 for the tentially concurrent I/O, you might need
one function at a time, although running full software licence. But you’ll no doubt the vendor’s forthcoming 1-GHz proces-
one function at a time is possible. “Once explore the depth of the online docu- sor. But, thanks to a common code base,
the application is loaded, the profiler en- mentation during your experiments, you can develop using the 6416 and tar-
gine and a patented feature of the hard- which will provide a taste of the power get another, less I/O-capable family mem-
ware displays the percentage and total ex- that the unrestricted product offers. For ber. For example, the base 6411 chip
ecution count information about the example, there’s a useful section on port- omits the communications coprocessors
application without inserting any wrap- ing the IDE to new and custom hardware to reduce cost from the 6416’s $145
per code around the application,” he says. environments. Also, don’t forget to revisit (1000) to $53 (1000).
Software-profiling tools typically in- the vendor’s Web pages, where you’ll find Manufactured by Spectrum Digital, the
sert intrusive “wrapper code” at the head facilities such as updated information, a starter kit comprises a 115⫻225-mm pc
and the tail of a function to obtain sta- DSP user’s community, and many more board that carries the DSP, a TLV-
tistics. VisualDSP⫹⫹ dispenses with this code examples. 320AIC23 codec, 16 Mbytes of SDRAM,
overhead by taking a statistically random 512 kbytes of user flash, and a USB-to-
sample of the PC counter to record in- 1-GHZ POWER SPEEDS CONCURRENT I/O JTAG emulation port—all for around
struction addresses, derive execution sta- If you have the taste for experimenta- $395. Four 3.5-mm stereo I/O ports con-
tistics, and reveal code bottlenecks. Oli- tion, you may wonder what’s involved in nect with the codec; some simple user-
vadoti also remarks that the simulator tackling a truly complex processor, such I/O ports connect to a bank of four LEDs
has several sophisticated and differenti- as Texas Instruments’ TMS320C6000 and piano-key switches; high-density ex-
ating features, such as cache and pipeline family, which usefully offers code com- pansion connectors provide additional
visualisation. Because these features of- patibility between fixed- and floating- memory, peripherals, and PCI/HPI con-
fer so much detail, Olivadoti notes, they point derivatives. A new starter kit show- nectivity; and two port options provide
are “simulator-only features.” He says cases a 600-MHz version of the 16-bit external emulators. Despite the hard-
that, at this level, it’s currently impossi- fixed-point 6416, which follows the ven- ware’s complexity, user-friendly software
ble to unobtrusively interrogate a proces- dor’s 256-bit very-long-instruction-word showcases Texas Instruments’ eXpress-
sor at full speed without adding “a ton of model. This top-end machine targets I/O- DSP philosophy for modular software
logic and testing pins to the part.” intensive applications, such as communi- development. Ideally, eXpressDSP ab-
The ADSP-BF533 folder that the en- cations infrastructures. For example, the stracts applications well enough that non-
vironment creates during installation in-
cludes a number of 533-specific exam-
ples that you can use as templates for FOR MORE INFORMATION...
further development. Each includes a For more information on products such as those discussed in this article, contact any of the following man-
readme.txt file that describes functions ufacturers directly, and please let them know you read about their products in EDN Europe.
and execution details. One major obsta- Agere Systems Intel Texas Instruments
cle for newcomers is the C programmers’ www.agere.com www.intel.com www.ti.com
propensity for hiding essential compo-
Analog Devices The Mathworks Xilinx
nents, such as header files, and obscuring
www.analog.com www.mathworks.com www.xilinx.com
their interactions; the good news is that
the examples that accompany this kit are Forward Concepts Motorola Semiconductors
crystal-clear. You’ll also encounter VSCE www.fwdconcepts.com www.mot-sps.com
(VisualDSP⫹⫹’s component-software-
engineering) modules, which are pre- Infineon Technologies Spectrum Digital
configured routines written to Analog www.infineon.com www.spectrumdigital.com
Devices’ VCSE standard. This approach
40 edn europe | September 2003 www.edn.com
programmers can build applications by tions include no support for dynamic
combining off-the-shelf algorithms with- memory allocation, thread pre-emption,
in an execution template. In practice, you blocking, multirate operation, or control
need sound C-programming experience functions. For these reasons, RF1 best
to maximise the power that’s available to suits lower end 5xxx hardware platforms.
you. The midrange RF3 typically suits 6xxx
The core of eXpressDSP is the DSP/ chips that run one to 10 channels and
BIOS kernel, a real-time scheduler that one to 10 algorithms. The RF3’s approx-
comprises various modules to handle in- imately 11k-word footprint provides all
terrupts, pipes for streaming data, and of the facilities missing from RF1 except
periodic events. In effect, DSP/BIOS task blocking. Finally, the RF5 version
forms the basis of an RTOS that supports provides all such facilities, together with
the 320C family. The Code Composer support for more than 100 channels and
Studio IDE has a DSP/BIOS configura- algorithms, at the cost of some 28k words
tion tool that lets you pick which mod- of memory. RF5 enhancements include
ules to include at program compilation a separate thread for multiprocessor sup-
time, thus minimising the firmware’s port.
memory footprint. Crucially, each Your starter kit may include an eX-
DSP/BIOS module includes code instru- pressDSP for Dummies booklet, which
mentation that permits Code Compos- targets first-time users. At press time, the
er to debug running applications via the RF3 application that this booklet stud-
USB-emulation link. For minimum im- ies was unavailable for the 6416 starter
pact on real-time operation, the instru- kit. But an e-mail to Elizabete de Freitas,
mentation runs when the DSP chip is in a DSP-software field-application engi-
its idle state between executing code neer for TI Germany, produced a port
threads. You can view the execution within a few hours—suggesting that C
graph as it switches between threads in programmers should have little difficul-
a graphical format that helps reveal ty in tailoring reference-framework ap-
missed deadlines (Figure 4). Other tools plications to new environments. RF3 de-
that ease time-critical task analysis in- scribes a two-channel audio-processing
clude a CPU load graph and a statistics chain, in which alternate channels in-
window that reports a thread’s average- clude highpass and lowpass FIR filters
and worst-case execution times. You can with independent volume controls. You
also pipe in data from a PC-resident file, first learn how to build and run the ap-
such as the output from a DSO that rep- plication using, for example, a PC’s CD-
resents a real signal you want to process. ROM and sound card. The booklet then
DSP/BIOS provides a well-defined set of describes the internal structure of the
APIs that apply across the entire 320Cxx sample application; core components,
family, so you can easily port code be- such as header files, include compre-
tween alternative platforms. By hensive documentation. Finally,
banning direct access to pe- You can reach
you get to modify the applica-
ripherals and insisting upon re- Contributing tion to produce a mono audio-
Editor David
entrant, relocatable code, the Marsh at
recorder/player. You start by re-
eXpressDSP algorithm stan- forncett@ moving one channel to leave
btinternet.com.
dard permits easy integration of mono sound that’s either high-
compliant algorithms from pass- or lowpass-filtered. In
Texas Instruments and third-party ven- practice, the RF3 application-note in-
dors.You can download a free DSP/BIOS structions don’t quite work, but the de-
driver-development kit. bugging environment helps you by
You can also freely download a set of highlighting errors and warnings in red;
three reference-framework applications double-clicking on an error takes you di-
as “starterware” for most of TI’s starter rectly to the offending line in the ap-
kits. Comprehensive application notes propriate file. Exercises also present im-
describe the capabilities of these models, portant tools, such as the execution
including examples of how to customise graph. An unlimited version of Code
them. RF1 targets approximately 3.5k- Composer Studio costs $3595, including
word-footprint applications that require one year’s maintenance; Spectrum Dig-
one to three channels and a similar num- ital’s XDS510 USB emulator adds
ber of eXpressDSP algorithms. Limita- $1995.왍
www.edn.com September 2003 | edn europe 41
ARCHITECTURE SPEEDS REAL-TIME DELIVERY
Although embedded control often putation unit adds a hardware need for specialist DSP processing units, which are
mandates hard-real-time opera- MAC (multiply-accumulate) unit programmers. arranged as two banks of four,
tion, DSP operations are by nature to the familiar ALU (arithmetic- Recognising this obstacle to each of which operates on a par-
hard-real-time operations; miss a logic unit) and typically includes a DSP adoption in the mid-1990s, allel datapath. Each of these four
processing deadline, and you dis- barrel shifter to accelerate shifts chip designers at Texas Instru- processing-unit types is optimised
tort your data. This observation and rotates. ments also perceived that emerg- to handle different operations,
has implications on system hard- The Harvard architecture max- ing applications, such as cell such as general arithmetic and
ware, software, and tools alike. imises throughput for parallel, phones, were demanding DSP logical computations, multiplica-
The DSP’s innate math abilities repetitive operations, such as con- functions but stressing algorithms, tion, data shifts and register-to-
depend on several key adapta- currently fetching the two such as Viterbi and turbo-code register transfers, and load/store
tions to processor architecture. operands you need for the MAC routines, which share little with and complex address generation.
Like a microcontroller core, a DSP operations that typify filter inner- traditional filter algorithms. To Because the instruction-fetch
basically comprises a program- loop calculations. Unlike a micro- open programming to a wider packet length is constant at 256
control unit, a data-control unit, controller’s data flow of sequential application and skill base, high- bits, the instruction-set format sig-
and a computational unit (FFigure load-compute-store cycles, a DSP level-language-compiler support nals that instructions are chained
A). characteristically performs its was essential to replacing quirky for concurrent execution by set-
Comprising an instruction reg- instruction, data fetches, and com- assembler scripts. TI developed a ting the least-significant bit in each
ister, a program sequencer, and putation, and makes a new result very-long-instruction-word archi- 32-bit code. When the chain
selective cache memory, the pro- available within a single processor tecture that’s computationally effi- breaks with a zero in this bit posi-
gram-control unit fetches the next cycle. From a traditional automa- cient and friendly to compiler writ- tion, control logic reschedules the
instruction from program memo- tion and control viewpoint, this ers, because it can concurrently current and subsequent instruc-
ry, decodes it, and issues control style of repetitive-loop computa- execute multiple instructions using tions for the next execution pack-
signals to the processor’s core. tion on arrays of data contrasts a RISClike instruction set. For et. The instruction-fetch logic does
But, unlike a microcontroller, this with the multisensor data acquisi- example, the inner loop of a not fetch new 256-bit fetch pack-
DSP’s data-control unit has two tion and control-signal derivation 320C6xxx filter routine computes ets until the current fetch packet
data-address generators that con- that typifies PID applications. two MAC results per cycle by exe- completes. Assembly-language
currently calculate the address of Originally, constraints on silicon cuting as many as eight 32-bit programmers specify instructions
operands in separate data and real estate meant that DSP instruc- instructions in parallel. The core’s for parallel execution by prefixing
program memories (“Harvard” tion sets had to be compact, often control logic comprises program statements with a double-pipe
architecture) and pass this data to packing several complex instruc- fetch, dispatch, and execution symbol; if an execute packet
the computation unit. The selec- tions into one 16-bit word. As a hardware to speed data flow exceeds a 256-bit boundary, the
tive program cache memory pre- result, compiler writers had great through the instruction pipeline assembler moves the excess into
vents access conflicts when difficulty in matching the memo- (FFigure 3). the next fetch packet and pads
instructions repeatedly demand ry-usage efficiency of handwritten After decoding, each instruction unfilled instructions with no-oper-
two data items. An extended com- assembly statements, creating the can control one of the core’s eight ation codes.
CACHE INSTRUCTION
MEMORY REGISTER
PROGRAM-MEMORY ADDRESS
PROGRAM-MEMORY ADDRESS
DATA-MEMORY ADDRESS
DATA-MEMORY ADDRESS
PROGRAM-MEMORY DATA
PROGRAM-MEMORY DATA
BUS EXCHANGE
DATA-MEMORY DATA
Figure A
INPUT REGISTERS INPUT REGISTERS INPUT REGISTERS
Analog Devices’ first-generation ADSP2100 shows the key features that differentiate a DSP from a general-purpose microcontroller.