0 ratings0% found this document useful (0 votes) 161 views24 pagesUnit 5
Digital Signal Processing architecture
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
INTRODUCTION TO
PROGRAMMABLE
DSPs
The programmable digital signal processors (P-DSPs) are designed with features that are specifically
required for digital sigral processing applications, The conventional microprocessors are meant
for general purpose applications and hence they do not have these features. However, an advanced
tnieroprocessor or a RISC processor may use some of the techniques adopted in P-DSPs or may even
have instructions that are specifieally required for DSP applications. They may have performances close
to that of a P-DSP for certain operations, For example, the DEC Alpha 21064 computes a 1024 point
complex FET in 480 Us, as compared to the Analog device ADSP 21050 that takes about 450 is to
carry oul the same operation. However in termsof low power requirement, cos, real time VO eapability
and availability of high speed on-chip memories, the P-DSPs have an advantage over the advanced
microprocessors and the RISC processors. In this chapter some of the features specifically required for
performing cigital signal processing operations efficienly are discussed in detail
MULTIPLIER AND MULTIPLIER ACCUMULATOR (MAC) 24
‘One of the most common operations required in digital signal processing applications is. array
multiplication, Forexample, convolution and correlation require array multiplication, In Chapter I, itwas
shown how the array multiplication can be done using a single multiplier and adder. The implementation
scheme is reproduced in Fig, 2.1. One of the important requirements of these array multipliers is that
they have to proves the signals in real time. Before the next sample of the input signal arrives at the
input to the array, the array multiplication should be completed. This requires the mutiplication as
well as accumulation to be carried out using hardware elements. There are two approaches to solve
this problem. A dedicated MAC unit may be implemented in hardware, which integrates multiplier
and accumulator in a single hardware unit, This
approach is adopied by the Motorola DSP processor
DSPS600X. The other approach isto have multistier
and accumulator separate, For example, inthe Texas
Instruments DSP processor, 320CSX, the outpat o
the multiplier is stored into the product register. The
content of this in tutn eas be added to accumulator Fig-21_ Implementation ef conolver with single
register ACC in the central ALU. In both of the rouliplerlodder58 Digital sanal Processors
above approaches. the MAC operation can be completed in one clock cycle. The presence of H/W
‘multipliers and/or multiplier accumulator is one of the mandatory requirements of a P-DSP.
In Fig. 2.1, y,. the ouiput at the nth sampling instant, is obtaines by multiplying the array x, = [y
Pa aa 2%] Somenponding othe present andthe pst M1 samples ofthe np
with the array h= [fff lh, ty, fh, ,] Corresponding io the impulse response sequence. To obtain
vy the input signal array x, is mapas with the array h. The vector x, ., is obiained by shifting
the array x, towards tight so that the (v + 1)" sample of the input datas, ,, becomes the first clement
andall the lementsof x, are shifted towards right by I position so that the * sloment of x, becomes the
(/+1)* clement of x, , Instead of shifting the elements of x, towards right all ata time dfter finishing
the vector multiplication, each of he elements may be shifted separately soon after the MAC operation
that uses these elements is over. For example, afer obtaining the product &,, , the element
x,_ May be made to be equal t0.x,_,,.,. Similarly, after obtaining the product, ,, the element
ay be made equal to x,_ y,. ad so on
is achieved in P-DSP by using a special instaiction called MACD multiply accumulate with date
shift, For example, TMS320CSX has the instruction MACD pia, dina, which multiplies the content of
the program memory pma with the content of the data memory with address dma and stores the result in
the product regiser. The content of product register is aideé to the accumulator before the new product
i stored. Further, the content of dma is copied to the next lecation whose address is dma + 1
MODIFIED BUS STRUCTURES AND MEMORY ACCESS SCHEMESINP-DSPs_ 2.2
It may be noted that the MAC operation with daia move (i.e. the MACD instruction) requires four
memory accesses perinstruction cycle. (An instruction eyele isthe time that elapses since an instruction
is fetched till the particular instruction completes execution including the timetakea for writing the result
into.a register or memory: Many of the insiructions in P-DSPs including the MACD instruction require
only one processor clock period/instruction eyele. In the conventional microprocessors one instruction
eyele corresponds to several clock periods.) The four memory accesses/clock period required for the
MACD instructions are as follows
LL Reich the MACD instruction from the program memory
Fetch one of the operands from the program memory
Fete the second operand from the data memory
Write the contert of the data memory with address dma into the location with the address
dma
‘The ‘elatively static impulse response coefficients are stored in the program memory and the
samples of the input date are stored in the data memory. If the MACD instruction is to be executed
in a machine with Fon Newmann architecture, it requires
four clock eyeles. This is because in the Von Neumann Resuts_f
architecture shown in Fig. 2.2 there is a single address
bus and a single data bus for accessing the program as,
‘well as data memory area. One of the ways by which the
number of clock cycles required for the memory access
can be reduced is use more than one bus for both |Cenvotwn
address and data. For example in the Harvard architecture
shown in Fig, 2.3, there are two separate buses for the Fig. 2.2. Von Neumann architecture
‘Operands ||P
lopcoseIntroduction to Programmable oses tt 59
Faeieng], Resumoperands_[—oam Fog] emtvonsanss| Sam
[eo coals [mer [cc's BSE Han
Fig.23 Horandarchiecare Fig. 24 Modified Hervardorchtectre
program and dats memory. Hence the content of program memory and data memory ean be accessed in
parallel. The instnuction cade ean be fed from the program memory to the control unit while the operand
is fed to the processing unit from the éata memory. The processing unit consisting of the registers and
processing elements such as MAC units, multiplier, ALU, shifter, ete., are also referred to as data path,
The P-DSPs follow the modified Harvard architecture shown in Fig. 2.4. One set of bus is used to acvess
a memory that has both program and data and another that has data alone, Data can also be transferred
from one memory to another. The modified Harvard architecture is used in several P-DSPs, for example
P-DSPs fiom Tenas Tnstruments and Analog deviees.
With the Harvard architecture, the number of memory accessesielock cyele was shown to be two.
This can be increased futher by using more number of buses. For example, by using three separate
address and data buses, the number of memory accesses‘elock eyele can be increased to three, Motorola
DSPS6ODX, DSP96002, ete. have three separate buses. TMS320CS4X has four address buses.
Since the cost of an IC increases with the number ofpins inthe IC, extendinga number of buses outside
the chip would unduly increase the price. Hence the P-DSP’s use multiple buses only for connecting
the on-chip memory to the control unit and data path, For accessing off-chip memory only a single bus
is used for avcessing both the prozram memory and data memory. Because of this. any operation that
involves an off-chip memory is slow compared to that using the on-chip memory.
MULTIPLE ACCESS MEMORY 2.3
The number of memory accesses/clock period can also be increased by using a high speed memory that
permits more than one memory aecess/clock period. For exumple, the DARAM, the dual access RAM,
Permits two memory accessiclock period, Multiple access RAM may be connected to the processing
unit of the P-DSP by using the Harvard architecture, For example DARAM connected to @ P-DSP with
‘ovo independent data and address buses can be used to achieve four memory aceessey elock period.
MULTIPORTED MEMORY 24
Another technique that is adopted for increasing the number of accessesiclack perind isto use multiport-
ed memory. For example the dual port memory has two independent
data and address buses as shown in Fig, 2.5 and hence two memory
accesses ean be achieved in a clock period. Multiparted memories dis-
al ort
pense with the need for storing the program and data in two different “S82
memory chips in order to permit simultaneous access to both program
and data memory. However, one of the major limitations of the dual
ported memory is the inerease in the cost compared to two single port
Fig.2.5 lock diagram ofa
dusiported memory60 tt Digta signal Processers
memory of the same lotal capacity. This is because of the inereased number of pins and larger chip area
required for the dualported memory. Larger number of 1/0 pins require a larger and more expensive
package and a larger die size
Some P-DSPs combine the modified Harvard architecture with the dualported memories. For
example, the Motorola DSP 561XX processors have a singleposted program memory and a dualported
data memory, Hence one program memory access and two data memory accesses can be achieved per
clock period
VLIW ARCHITECTURE 25)
Another architecture used for P-DSPs, for example in TMS320C6X, is the very long instruction word
(VLIW) architecture. These P-DSPs have a number of processing units (data prths). In other words,
they have a number of ALUs, MAC units, shifters, et. The VLIW is ed from memory and is
used to specify the operands and operations to be performed by each of the data paths. As shown in Fig,
2.6, the multiple functional units share a common multiported register file for fetching the operands
and storing the results, Parallel random access by the functional
Units to the register file is facilitated by the read/write cross bar
Execution of the operations in the Functional units is earried out
concurrently with the load/store operation of data between a
RAM and the register file
‘The performance gains that ean be achieved with VLIW
architecture dependson the desree of parallelism inthe algorithm
LETT
‘Reatirecrost bar
selected for a DSP application and the number of functional tet yit
units, The throughput will be higher only if the algorithm, reson |—|Purton
involves execution of independent operations. For example, in
Fig. 2.1, by using eight functional units, the time required for
convolution ean be reduced by a factor of 8 compared to the
cease whore a single finetional unit is used,
However, it may not always be possible to have independent
stream of data for processing, Further the number of functional
units is also limited by the hardware cost for the muliponted
register file and cross bar switch
[Tf
Insructon ceche
Fig. 2.6 Block diagram ofthe VW
architecture
PIPELINING 2.6
One of the approaches adopted for increasing the efficiency of the advanced microprocessors as well
as P-DS?s is instruction pipelining, An instruction cycle starting with the fetching of an instruction and
ending with the execution of the instruction including the time storage of the results ean be split into @
‘number of mieroinstractions. Execution of each of the mieroinstructions is also referred to as one phase
of an instruction. For example, an instruction eyele requiring four microinstructions can be said to be in
four phases as follows:
1. Fetch phase in which the instruction is fetched from the program memory
2. Decode phase in which the instruction is decoded
3. Memory read phase in wich the operand required for the execution of the instruction may be
read from the data memoryIntroduction to Programmable oses th 61
4. Execution phase in which execution as well as the storage of the results in either one of the
registers or memory is carried out
Each of the above microinstructions may be carried out separately by four functional units. Let us,
assume that each of the above four phases take equal time for completion. In this ease in aconventional
microprocessor with no pipelining, each of the functional units is busy only 25% of the time. This is
because only one instruction is processed at the CPU at a time, Figure 2.7 shows when each of the
funetional unit is busy when a program containing three instructions I1, 12, 13 is exeouted.
TF [Fa aaa rae aaa [Fae Baas [Read
1 a | 3 it {
z 7 { z ea t
a} = z ape
z = i = se
€ 2 | oes ep
E | zo ee Pe
a |e] fe u fete te
$ 5 | ote [7 |
i a 1 n eT
fT at “1 is
2 1 [=] Le 1 Ts
Fig. 2.7 Insructon cycles of processor wih no pipelining Fig, 2.8 Insructon cycles of processor with pipelining
The functional units can be kept busy almost all the time by processing a number of instructions
simultaneously in the CPU, For example, in a machine with four functional units, four instructions 11,
12, B and 14 can be processed simultaneously as shown in Fig. 2.8, When Il enters the decode phase
2 can enter the opcode fetch phase. When I1 enters the operand read phase [2 enters the decode phase
and (3 enters the opcode fetch phase. When II enters the execute phase 12 enters the operand read plase
1B enters the decode phase and I4 enters the opeod fetch phase. The pipeline is fully loaded now and
all the functional units have useful work to do. The instructions that follow [4 keep the functional units
busy till the program is exited. Let T denote the time required for each phase of the instruction, One
clock eycle of the processor corresponds to T. Ina period of 127 only three instructions can be executed
in a machine without pipelining, In the same period nine instructions ean be executed as shown in
Fig 2.8. Hence the throughput is inereased by a factor of 3 in this ease.
Itmay be noted that the initial latency of a machine with four phases is 47° Hence for executing a
program with W instructions, the time required for execution is (N'+ 4)7.. Witha non-pipelined machine,
the time required for executing V instructions is ANT.
Instruction pipeline shown in Fig. 2.8 corresponds toa highly optimistic ease. Inthe case of processors
requiring single clock cyele for execution for each of the instructions in the program, the throughput
shown in Fig. 2.8 can be achieved, This is normally achieved with restricted instruction set computers
(RISC), However in complex instruction set computers (CISC), there are also instructions with multiple
word requiring multiple clock cycles for execution. In this case all the functional units cannot be kept
busy all the time. For example, in the ease of call and branch instructions of « P-DSP, four phases or 7
states are required for the call’branch instruction to exit execution phase. By that time two more single
word instructions or one double instruction enters the instruction pipeline. These instructions should not
be executed. Hence two words have to be flushed aut ofthe instruction pipeline before the insiructions
are fetched starting from the new program address.62 th Digtatsignai Processors
‘To overcome this problem. some of the P-DSPs have special branch/call and return instructions
called as delayed branchicalliretum instructions. When the delayed branch instruction is executed, the
program branches tothe new program address only afterthe two I -word instructions orthe single 2-word
instruction following the branch instruction are completely executed. Similarly, when the delayed eall
instruction is executed, the program calls to the subroutine only afier the two I-word instructions or
the single 2-word instruction following the call insiruction are completely executed. When the delayed
callbranch return instructions are executed, there is no need for flushing the pipeline and maximum
throughput is obtained. Examples of pipeline operation of delayed as well as undelayed branchieall
instructions are given in Chapter 4.
The throughput eificiency of the pipeline may also be reduced because of conflicts between the
instructions in the instruction pipeline in different phases, This happens if the same memory is used to
store the data and program and there is only a single address bus for addressing both the program and
data memory. This is truein the case of off-chip memory. For example, an instruction in fetch phase may
try to fetch the instruction code from a memory chip that is also accessed by another instruction that s in
the operand read phase. To avoid the conflict, the operand read phase will be done first and the opcode
fetch phase will be repeated tll there is no conllict again.
‘The number of instructions that are processed simultaneously in the CPU, also referred to as depth
of the instruction pipeline, differs in different families of P-DSPs. The pipeline depths of some of the
P-DSPs are given in Table 2.1
Table 2.1 Instruction pipeline depth of some PDSPs
P-DSP Namolfamily Pipeline Depth
‘Analog devices z
Matorols DSPS6O0X 3
TITMSs20c3x 4
ELIMS 320054x 6
SPECIAL ADDRESSING MODES IN P-DSPS a3
In addition to theaddressing modes such as direct, indirect and immediate supported by the conventional
microprocessors, P-DSPs have special addressing modes that permit single word/instruction format and
thereby speed up the execution by making effective use of the instruction pipelining, Further there are
also special addressing modes such as cyclic addressing and bit reversed addressing that are specifically
tailored for DSP applications. The details of these addressing are presented next,
2.7.1 Short Immediate Addressing
This permitsthe operand to bespecified using a short constant that forms part ofa single word instruction.
The length of the short constant depends on the instruction type and the P-DSP. For example in the
case of TL TMS320C3X, an 8-bit eonstant can be specified as one of the operands in the single word
instructions for addition, subtraction, AND, OR, XOR, ete.Introduction to Programmable oses th 63
2.7.2 Short Direct Addressing
This permits the lower order address of the operand ofan instruction to be specified in the single word
instruction, In the TI TMS320 DSPs, the higher order 9 bits of the memory are stored in the data page
pointer and only the lower 7 bits are specified as & part of the instruction, Bach contiguous block of
128 words is referred to as one page in the TI DSPs, The argument in the instruction specifies only the
location within the current page, In the Motorola DSPS400X, short direct memory addressing permits a
6-bil address to be specified in the instruction.
2.73 Memory-mapped Addressing
The CPU registers and the V/O registers of the P-DSPs are also accessible as memory location. This is
achieved by storing them in either the starting page or the final page of the memory space. For example,
in TMS320C5X. page 0 corresponds to the CPU registers and 1/0 registers. In the case of Moterola
DSPS600X, the last page of the memery space containing 64 locations is used as the memory map for
the CPU and 1/0 registers. When these registers are accessed using memory mapped addressing mades,
the higher address bits are not taken from the data page pointer and instead made to be O in the case of
TIDSPs and made to be 1 in Motorola DSPs.
2.7.4 Indirect Addressing
In P-DSPs this addressing mode has a number of options. This permits an array of data to be processed
in P-DSP to be efficiently fetched and stored. The address of the operands can be stored in one of the
registers called indirect address registers. In the case of TI processors the indirect adress registers are
called anviliary registers ARS. Any of these registers can bo updated when the eperand fetched using
these registers are being executed. This is made possible by having an additional ALU in the CPU core
specifically for the indirect address registers or ARs. The ARs may be incremented or decremented
cither in steps of I or in steps specified by the content of an offset register. Inthe ease of TI processors,
the offset resister is called an [NY register. In the P-DSPS from analog devices itis called the modifier
register: The content ofthe indirect address registers may also be updated by a constant using bitreversed
addressing mode explained in the next section. In the TI SX processors the new address computed by
the auxiliary ALU is not used for fetching the operand for the current instruction that is being decoded
and is executed. It is used for fetching the operand that uses the indirect addressing mode next with this
particular AR. For this reason, the indirect addressing mode used in TI SX P-DSPs is called indivect
addressing mode with pos-inerementéeerement, In Motorola DSPS63XX, tho updated indirect address
register content may also be used to fetch the operand for the current instruction, Hence this mode is
called the indirect addressing mode with pre-increment/ decrement. In TI TMS320CS4X_ processors
both post-incremenvdecrement and pre-incrernenVdee-rement operations are supported.
2.7.5 Bit Reversed Addressing Mode
The bit reversed number representation is explained in Section 1.14. The binary pattern corresponding
to aparticular decimal number is obtained by writing the natural binary equivalent of the number in the
reverse order so that the most significant bit ofthe natural binary number becomes the least significant
bit of the bit reversed no and vice vers
For the computation of the FFT, the data is 10 be arranged in the bit reversed order and 2-point DFT
of the resulting sequence is to be computed first, Inthe bit reversed addressing mode, when a 16-point
FETis to be computed, 2-point DFT of X(0) and X() isto be found. Similary 2-point DFT of X(4) and64 tt Digta! Signal Processors
(12) and so on, It may be noted from Table 1. that the value 0, 8, 4, 12 corresponds to the consecutive
‘numbers in the bit reversed number representation. In the bit reversed addressing mode, the address is
incremented decremented by ihe number represented in the bit reversed form.
2.7.6 Circular Addressing
In real time processing of signals, the input signal is continuously stored in the memory. The processed
data is stored in another memory space continuously and may be writen onto the output deviee. In
this case input as well as output program will be simple, However, since the input as well as output
memory space will be finite in size, the entire memory space would be exhausted after processing the
{input signal for some time, if the data is written into the memory by using linear addressing mode. One
‘way to evereome this problem is to keep checking whether the range of either the input or the output
memory space is exeveded. In that case, the new data is to be stored starting from the beginning of the
particular memory space. However, checking this condition is an overhead that can be overcome using
the circular addressing mode, In this mode, the memory can be orgenised as a circular buffer with the
beginning memory address and the ending memory address corresponding to this buffer defined by the
programmer. In the circular addressing mode, when the address pointer is incremente¢, the address will
bbe checked with the ending memory address of the circular buffer, ITit exceeds that, the address will be
made equal o the beginning address of the eircular bu
ON-CHIP PERIPHERALS 2.8
‘The P-DSPs have a number of on-chip peripherals that relieve the CPU from routine functions. Further
they also help to reduce the chip count on the DSP system based around P-DSP. Some of the on-chip
peripherals in the P-DSPs and their functions are as Follows,
2.81 On-chipTimer
Two of the common applications of the timers are generation of periodic interrupts to the P-DSPs and
generation of the sampling clocks for the A/D converters. The timer mede can be programmed by the
P-DSPs, The timers can generate a single pulse or a periodic train of pulses. They can also generate &
single square wave or a periodic square wave. The period of the timer is also made programmable
2.8.2. Serial Port
This enables the data communication between the P-DSP and an external peripheral such as A/D
converter, D/A converter or an RS232.C device. These ports nonmally have input and output butlers so
thatthe P_DSP writes or reads from the serial port ia parallel form and the serial port sends and reecives
data to the peripherals in serial form. They also generate interrapts when the serial port output baffer
is empty or the input buffer is full, These devices bave parallel 0 serial and serial to parallel converter
inbuilt into them, The shift clock ean be fed either from the P-DSP oF an external devive ean supply
it, The serial ports ean operate cither in the asynchronous mode or in the synchronous mode. In the
asynchronous mode, the transmit data and receive data lines alone are used for communication and bit
clock is transmitted from either end. In the case of synchronous moée, both bit clock and a frame syne
signal that indicatos the beginning of the first bit of the data transmitted using synchronous mode is
transmitted from the serial port to the 1/0 device and also from LO port to the serial port. Example, of
the two signals with respect to the transmitted data is shown in Fig. 2.9.Introduction to Programmable osPs tt 65
UU
cuKR
copes
se L368
pao
INT ! (\_____
FSR Recave fame se (LKR: Recah lock DR: Receive date
Fig. 2.9 Burst mode serial port receive operation
2.83 TDM Serial Port
The P-DSPs have a special serial port called TDM serial port, This permits a P-DSP to communicate
with other devices or P-DSPs by using time division multiplexing (TDM). One of the devices can
_generate the frame syne pulse that indicates the beginning of a TDM frame and bit clock, the duration
for which « bit isto be transmitted, As shown in Pig 2.10 the TDM frame is split into o number of equal
slots and each slot can be allotted for one af the deviet
cnt | erz | cna | ena | ons | cre | car | one
Fig. 2.10 TDM frome with & time slots
For example, in Fig. 2.10, there are 8 slots/frame and is referred to as a TDM with eight channel. In
cach of the slots, a number of bits may be transmitted by a channel, The TDM serial port normally uses
four lines for the purpose of serial communication. They are
TERM. the frame syne signal
TClock: the bit clock
TADD: The address of the serial device that is outputting data in a particular TDM slot
TAT: The data transmitted into the TDM channel by the authorised device
‘The signals TADD and TDAT are bidirectional and are tristate controlled so that only one of the
devices transmit the data and address in these lines at a time. Any one of the devices can generate the
TERM and clock signals and they are used by the other devices as a reference. A scheme where eight
devices ate interconnected using the TDM serial port is shown in Fig, 2.11. An example of how TI
‘TMS320CSX can be configured to be one of the devices is shown in Fig. 2.12. An example, of each of
the devices outputting a 16-bit data (D15 - DO) in its slot and also the address ofthe deviee (A0-A15),
which is supposed to receive this data is shown in Fig. 2.13.66 11 Digital Signal Processors
ewe [ vee:
1 ta
Fig.2.11 Incerconnecing 8 devices using TOM serial using 4.bPbus
OX <4 tat
osx TrSR |» Tao
roux
eu |» I. > roux
Fig.2.12TMS320C5X confgured tobe one of TDM devies
vax UU
ror (om Yo1e) {en 05) (or om) —
v0 XS Tam) nay a) —
13. Data transfer using TDM charnel
2.8.4 Parallel Port
Parallel ports enable communication between the P-DSP and other devices to be faster compared to the
serial communication by using a number of lines in parallel. In addition, they also have additional lines,
which are for strobing or for handshaking purposes. The P-DSPs kave two approaches for assigning
lines for parallel port. In one approach used by the TI, the data bus itself is used for parallel ports. This
is achieved by allocating a specific address space for VO ane whenever this address space is addressed
tusing the UO instructions, the parallel por signals including the handshaking signals are sent over the
data bus. In another approach, separate lines are dedicated for parallel ports including the handshaking
signals,
2.85 Bit I/O Ports
The P-DSPs have additional VO ports that are single bit wide. These port bits may be individually set,
reset or read. These bits are normally used for eontrel pusposes but they ean also be used fordata transfer.
There are no handshaking signals for these V/O ports. Some of these bits are also used for conditional
branching or calls. For example, in TI processors there are instructions such as branch if 10 zero,Introduction to Programmable oses tt 67
2.8.6 Host Port
‘The P-DSPs also have a special parallel port normally 8-bit or 16-bit wide called the host port that
enables them to communicate with a microprocessor or PC, which is called as. host. In addition to data
‘communieation, the host can generate interrupts and alse cause the P-DSP to load a program from ROM.
to the RAM on reset, Almost all the P-DSPs including the ones from Analog devices, Motorola and TI
have host ports
2.8.7 Comm Ports
These are parallel ports that are used for interprocess communication between a number of identical
P-DSP in a multiprocessor system. For example, a multiprocessor system may be built using 9 number
of TTMS320C4X. For the purpose of communication of the data between these processors six comm
ports each of width 8 bits is provided. Since the data to be processed may be 32 or more number of bits.
the P-DSPs have provision for splitting the data in streams of 8 bits and also assemble the 8 bits into
words of 32 bits, Analog devices DSP ADSP 2]06X has 6 comm ports each of which is 4 bits wide.
2.8.8 On-Chip A/D and D/A Converters
Some of the P-DSPs targeted towards voice applications such as cellular telephones and tapeless
answering machines have A/D and D/A converters inside the P-DSP. For example, Motorola DSP
561XX and Analog devices ADSP 21MSPSX both have the A/D and D/A on chip ané permit effective
sampling rates of about 8 KHz,
2.8.9 P-DSPswith RISC and CISC
P-DSPs may be implemented using cither the RISC processor or the CISC processor. For example,
TI TMS320C6X P-DSPs uses RISC processor and a large number of P-DSPs from Analog devices,
Motorola and TI make use of CISC, For example, TI TM$32054X and the Motorola DSPS63XX and
analog devices ADSP 2100X make use of CISC. TI TMS320C8X has ¢ RISC and four P-DSPs with
CISC ina single core. The relative advantages of each of these processors are as follows:
RISC Advantages
The chip area dedicated to the realisation of the control unit is considerably reduced because of the
reduced numer of instructions. About 20% ofthe chip area may be used for the control unit in RISC.
In CISC processors about 30 - 40% ofthe chip area is used up for the control unit. Therefore in a RISC
there is more area available for incorporating other features,
‘As a result of considerable reduction in the control area, the CPU registers and the data paths
(processing units) can be replicated and the throughput of the processor ean be increased by applying
pipelining and perallel processing
Ina RISC, all the instructions are of uniform length and take the same time forexeoution. Hence the
‘dummy pericdls or hold periods in she instruction pipeline is reduced 10 the minimum, This inereases the
computational speed
A simpler and smaller control unit in RISC has fewer gates. This reduces the propagation delay and
increases the speed. Reduced number of instructions, formats and addressing modes result in simpler
and smaller decoder, whieh, in tur, increase the speed.
In RISC processors, the delayed branch and call instructions can be effectively used and they improve
the speed.68 th Digital signa Processors
HLL support Writing the programs in C and C+ relieves the programmer from leaning the instruc.
tion set of a P-DSP and instead concentrate on the application, It increases the taroughpitt of the pro-
grammer. Since RISC has a smaller number of instructions, the compiler for any HLL is shorter and
simpler. The availability of a relatively large number of CPU registers permits a more efficient code
optimisation by maximising the use of CPU registers over slower memories.
CISC Advantages
Some of the advantages of RISC also turn out to be disadvantages when viewed from a different
perspective. The CISC processors have a very rich instruction set that even support high level language
constructs similar to “if condition true then do”, “for” and “while. The P-DSPs with CISC also have
instructions specifically required for DSP applications such as MACD. FIRS, ete. This makes the
application program written in the assembly language to be shorter and easy to follow. Since RISC has
smaller number of instructions, implementation of a single CISC instruction might require a number
of instructions in RISC. This increases the memory requized for storing the program and the taffie
between CPU and memory is increased. This is on the one hand inereases the computation time and on
the other hand makes the program difficult to debi
‘The HLL compilers are costly by several onders of magnitude compared to the P-DSPs themselves.
For P-DSP with RISC architecture, compilers are essential. For most of the low cost applications, DSP
platforms without the compilers are preferred. Henee a majority of P-DSPs are CISC based. The P-DSP
‘manufacturers have tried to keep the codes for the new processors upward compatible with the older
processors. This makes the learning curve steeper
The relative disadvantages of each of these architectures are diminishing, By making the RISC
processors applicable forlarger and larger applications, the cost of the chip per se and the compiler costs
are being brought down. The HLL compilers for the CISC processors ars also becoming as efficient as
hand assembly and the costs are coming down, Hence the distinction between the two in terms of cost
and debugging efficiency is likely to narrow down further. The code composer studio from Tl permits
the programming in HLL as well as assembly language in a single development environment so that the
best features of both the HLL and assembly language programming can be used by the programmer.
Review Questions \_-—-2-————_.
2A Explain why & MAC operation Is implemented in 2.6 Explain the different techniques adopted for in
hardware in programmable DSPs.
2.2. Explain how convolution is performed using a
single MAC unit,
2.3. Explain the differance between a MAC instruction
and MAC with data shift instruction. When is the later
Instruction preferred?
2.4 Explain the difference between Von Neumann
‘and Harvaed architecture for the computer. Which
architecture is preferred for DSF applleations and why?
25 Explain why the P-DSPs have mulkiple address
‘and data buses for internal memory and peripherals but
have only @ single address are data bus forthe external
memory and peripheral?
creasing the number of memory accesses/instruction
oye.
2.1 Explain how a higher throughput 1s obtained using
the VLIW architecture. Give an example, of a DSP that
hes VLNW architecture
2.8 Explain what Is mesnt by instruction pipelining
Eyplain with an example, how pipelining increases the
throughput efficiency.
29° Explain how delayee branchall instructions are
superier tothe undelayed branchvcall instructions.
2.10 txplain the memory mapped addressing mode
used In P-DSPs,241 What are the different waysin which the operand
forinstructionscanbe specified using indirect adkressing
mode:
242 What is meant by bit reversed addressing mode?
\ihat is the applieation for which this addressing mode
is preferred?
2.43 What is meant by circular adcressing mode? What
is the application for which this addressing mode Is pre
ferret?
214 Mention some applications of on chip timer in
PDs
Self Test Questions
21 The features in which PDSP is superior to advanced
microprocessors is
(2) Low cost (6) Low power
() Computaticnal speed (d) Real time /Ocapat
22 In modified Harvard architecture for letching the
content of program and data memony a separste bus
's used for ——— memory and a single bus is used for
=== menory
ty
23. Number of memoryaccessestcloct/period that ean
be achieved using on chia DARAM of aP-DSP is —
@ 2 3 ws
24 VLiiV architecture differs from conventional P-DS°
inwhich ofthe following aspects:
(@) Instruction cache
(b) Number of functional units
(0) Use pipelining
Gd) Adingle word fetched from menory has a numberof
instructions
25 A P-DSP has four pipeline stages and uses four
phase clack. The number of clock cycles vequired for
texecutinga program with 25 instruction is ———
(29 (2825S
Introduction to Programmable DSPs Tt
225 Distinguish between the synchronous
asynchronous mede of operation of serial ports
2.16 Explain the operation of TOM serial porss In
P-DsPs,
2AT Whats the use of host ports in P-DSPS How do
‘hey difer from tre comm ports?
2.48 List the relative ments and demerits of RISC and
CIC processors.
69
and
|--—_—_——_-
2.6 The number of instruction cycles required for
‘executing @ program in a microprocessor with no
pipelining is——,
@ 2 © Wa
24 The addressing mode that is convenient for FFT
‘computation ie———
(@) indirect addressing (b) Circular mode
(©)BIt reversed addressing (a) nemery mapped
2.8 The aclreccng that permits the conantin internal
register of the CPU & 1/0 to be accessed as memory
location is——
(@) indirect addressing _(b) Circular mode
(Bit reversed addressing (d) Memory mapped
2.9 The sevial port that permits thedatafroma number
(of WO devices to be sent using a single serial port is
called———
(@)Comm port
(6) Host port
(6) Time division multiplexing
(4) Bit VO port
230 _Which ofthe fllowing characteristics are true for
a RISC processor”
(2) Smaller contra unit
(6) Small instruction ot
()Short program length
(d)Less traffic between CPU & memoryARCHITECTURE OF
TMS3Z005X
INTRODUCTION al
Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog Devices and
Motorola manufacture the digital signal processor (DSP) chips. These manufacturers have developed
4 range of DSP chips with varied complexity. The underlying concepts are broadly the same. Some of
these concepts are discussed in Chapter 2. In order to give a feel for the design of systems with DSP
chips, in this chapter, some details on the design of systems using the TMS320C3X DSP chip (denoted
in brieFas 5X ) manufactured by TL are given
The TM$320 DSP family consists of two types of single-chip DSPs: 16-bit fixed-point and 32-bit
floating-point. These DSPs possess the operational fiexibility of high-speed controllers and the numerical
capability of array processors. Combining these two qualities, the TMS320 processors are inexpensive
altematives to custom fabricated VLSI and multichip bit-slice processors, TMS320CSX belongs to the
fifth generation of the TI’s TMS320 family of DSPs. The first five generations of TMS320 family are
CIX, C2X, C3X, C4X and CSX. The CIX, C2X, C2XX and CSX are 16-bit fixed-point processors
Instruction sets of the higher generation fixed-point processors are upward compatible to the lower
generation fixed-point processors. For example CSX can execute the instructions of beth CLX and C2X
The 54X is upward compatible with SX. C3X and C4X are 32-bit floating-point processors and C4X
is upward compatible with C3X instruction set. The siath generation C6X devices feature VelociT1!™:
an advanced very long instruction word (VLIW) archit by TI and can execute 1600
MIPS. The eighth generation C8X devices, have, on a single piece of silicon, a number of advanced
DSPs (ADSPs) and a RISC master processor. Typical application of the above families of TI DSP are
as follows:
CIX, C2X, C2XX, CSX, C54X: toys, hard disk drives, modems, cellular phones and active ear
suspensions,
C3X: filters, analysers, hisi systems, voice mail, imaging, barcode readers, motor control, 3D
aaraphics or sciemtfie processing
CAX: _parallel-processing clusters in virtual reality, image recognition telecom routing, and parallel-
processing systems.
COX; wireless base stations, pooled modems, remote-access servers, digital subscriber loop systems,
cable modems and multichannel telephone systems
ture develoArchitecture of tws320c8x. tt 71
C8X: video telephony. 3D compater graphies. virtual wality and a number of multimedia applications
The TI DSP chips have IC numbers with the prefix TMS320, Ifthe next letter is C (e.g. TMS320C5X),
it indicates that CMOS technology is used for the IC and the on-chip non-volatile memory is a ROM,
Ifitis E (e.g. TMS320E5X) itindicates thatthe technology used is CMOS and the on-chip non-volatile
memory is an EPROM. If itis neither (e.g. TMS3205X), it indicates that NMOS technology is used
for the IC and the on-chip not-volatile memory is a ROM. Under C3X itself there are three processors,
+€50, *C51 and “C5X. that have identical instruction set but have differences in the capacity of on-chip
ROM and RAM. The characteristics of some of the TMS320 family DSP chips are given in Table 3.1
The instruction set of TMS320CSX and other DSP chips is superior to the instruction set of
cconxentional microprocessors suci as 8085, Z80, ete, 28 most of the instructions require only a single
cycle for execution, The multiply accumulate operation used quite frequently in signal processing
applications such as convolution requires only one cycle in DSP.
Table 2.1 Characteristics of some of the TMS320 family DSP chips
"C25 "C30 "C50 "C541
Cycle time (ns) 200 toa oO 2
on chip RAM. aK 4K 4K 2k 3k.
Total memory 4K 128K 1M 128K 138K
Parallel ports 8 16 1M 6K 6K
Architecture of TMS320C5X DSPs _The block diagram of the internal architecture of CSX is shown
in Fig. 3.1. The 320C5X DSPs are said to have advanced Harvard architecture because they have sepa-
rate memory bus structures for programm and data and have instructions that enable data transfer between
the program and data memory area
BUS STRUCTURE 3.2
Seperate program and data buses allow simultaneous access to program instructions and dota. providing
a high degree of parallelism. For example, while data is multiplied, a previous product can be loaded
into, added to or subtracted from the accumulator and, atthe same time, a new address can be generated.
Such parallelism supports a powerful set of arithmetic, logic and bit-manipulation operations that can
all be performed in a single machine cycle. In addition, the ‘C5X includes the control mechanisms to
manage interrupts, repeated operations and function calling. The ’C5X architecture has four buses and
their functions are as follows:
Program bus (PB) It cerries the instruction code and immediate operands from program memory
space to the CPU.
Program address bus (PAB) It provides addresses to program memory space for both reads and
writes,
Dato reed bus (DB) t interconnects various elements of the CPU to data memory space.
Dato reed address bus (DAB) It provides the address to access the data memory space. The program
and data buses can work together to transfer data from on-chip date memory and internal or
program memory to the multiplier for single-cyele multiplyiaccumulate operatiors.«2
You have either reached a page that is unavailable for viewing or reached your viewing limit for this
book.Architecture of TMS320C5x Th 73
multiplier unit in the CSX processors performs 16 x 16 multiplication of numbers represented in 2's
complement form. The 32-bit PREG holds the result of mukiplication, The 16-bit temporary register 0
(TREGO) holds the muttiplicand. The other operand for the multiplication can be specified using one of
the addressing modes.
(0-16-bit left barrel shifter and right barrel shifter in CALU pert the contents of memory to be
left shifted by 0 10 16 bits before they are either fed to ALU or stored from ALU to memery. The CPU
registers ACC and PREG can alse be shifled using these shifters. In this ease they require Owo eyeles.
A 5-bit register TREGI specifies the number of bits by which the scaling shitter should shift cither the
incoming data to one of the CPU registers or vice versa. When the incoming data to CPU is left shifted
by the scaling shifter the LSBs are filled with 0.
‘AUXILIARY REGISTER ALU (ARAU) 34
It consists of eight 16-bit auxiliary registers (ARs) ARO-AR7, a 3-bit auxiliary register pointer (ARP)
and an unsigned 16-bit ALU. ARAU calculates indirect addresses by using inputs from ARs, 16-bit index
register (INDX) and auxiliary register compare register (ARCR), The ARAU can autoindex the current
AR while the data memory location is being addressed and can index either by + 1 or by the contents of
the INDX. As a result, accessing data does not require the CALU for address manipulation: therefore.
the CALU is free for other operations in parallel. This makes the instructions to be executed faster
‘compared to the conventional microprocessors. For example, let us consider the following sequence of
8085 instructions:
MOVA.M
INXH
These insiructions enable the accumulator to be leaded using indircet addressing
mode and HL
register used as the address pointer is incremented. These two instructions can be replaced by a single
SX instruction LACC *+, 0.
Further, any one of the auxiliary registers can be used as the address pointer and incremented by the
above instruction, The register that will be used is specified by the content of the ARP.
The auxiliary registers ARO-AR7 may also be used as the general purpose registers for holding the
‘operands forarithmetic and logical operations in CALU, Some of the other registers of ARAU and their
Functions are as follows:
INDEX REGISTER (INDX) 3.5
‘The 16-bit INDX is used by the ARAU as a step value (addition or subtraction by more than 1) to
modify the address in the ARs during indirect addressing. For example, when the ARAU steps across
a row of a matrix, the indirect address is incremented by 1. However, when the ARAU steps down a
column, the address is incremented by the dimension of the matrix. The ARAU ean add or subtract the
value stored in the INDX from the current AR as part of the indirect address operation. INDX can also
map the dimension of the address block used for bit-reversal addressing
AUXILIARY REGISTER COMPARE REGISTER (ARCR) 3.6
‘The 16-bit ARCR is used for address boundary comparison. The CMPR instruction compares the ARCR
to the selected AR and places the result of the compate in the TC bit of STI74 Tt Digital Signal Processors
BLOCK MOVE ADDRESS REGISTER (BMAR) 37
The 16-bit BMAR holds an address value to be used with block moves and multiply/accumulate
‘operations. This register provides the 16-bit address foran indirect-addressed second operand,
BLOCK REPEAT REGISTERS (RPTC, BRCR, PASR, PAER) 3.8
All these registers are 16-bit wide. Repeal counter register (RPTC) holds the repeat count in a repeat
single-instruction operation and is loaded by the RPT and RPTZ instructions. Block repeat counter
ster (BRCR) holds the count value for the block repeat feature, This value is loaded before a block
repeat operation is initiated, Block repeat program address start register (PASR) indicates the 16-bit
address where the repeated block of cade starts. The block repeat program address end register (PAER)
indicates the 16-bit address where the repeated block of code ends. The PASR and PAER are loaded by
the RPTB instruction,
PARALLEL LOGIC UNIT(PLU) 3.9.
It performs Boolean operations oF the bit manipulations requited of high-speed controllers. The PLU
can sel, clear, test or toggle bits in a status register control register, or any data memory location. The
PLU allows logic operations to be performed on data memory values directly without affecting the
contents of the ACC or PREG. Results ofa PLU finetion are written baek to the original data memory
location.
MEMORY-MAPPED REGISTERS 3.10
‘The *CSX has 96 registers mapped into page 0 of the data memory space. AII“CSX DSPs have 28 CPU
registers and 16 input/output (1/0) port registers but have different numbers of peripheral and reserved
registers. Since the memory-mapped registers are x component of the data memory space, they can he
written to and read from in the same way as any other data memory location, The memory-mapped
registers are used for indirect data address pointers, temporary storage, CPU status and control, or
integer arithmetic processing through the ARAU.
PROGRAM CONTROLLER 3.11
The program controller contains logic circuitry that decodes the instructions, manages the CPU pipeline
stores te status of CPU operationsand decades the conditional operations, Parallelism of architecture lets
the C5X perform three concurrent memory operations in any given machine eyele: fetch an instruction,
read an operand and write an operand. The program controller consists of the following elements:
16-bit program counter (PC)
16-bit status registers STO, ST1, processor mode status register (PMST) and circular buffer contro!
register (CBCR)
(8 x 16)-bit hardware stack
Adaiess generation logic
Instruction register
Interrupt flag register and interrupt mask registorArchitecture of TwS320¢5X th 75
a
Fig. 3.2(a) Status register 0 (STO) bit assignment
‘SOME FLAGS INTHE STATUS REGISTERS 3.12
The status registers ean be stored into data memory ancl loaded fiom data memory, thereby allowing the
*C5X status to be saved and restored for subroutines. The STO and STI each have an associated 1-level
deep shadow register stack for automatic context-saving when an interrupt trap is taken. These registers
are automatically restored upon a retum from interrupt.
‘The bit assignment details for STO and ST1 are given in Fig, 3.2. Significance of the various bits of
ST0and STI areas follows:
ARP (Auniliary Register Pointer) ‘These bits select the AR to be used in indirect addressing. When the
ARP is loaded, the previous ARP value is copied to the auxiliary register buffer (ARB) in ST1
OV (Overflow) flag bit This bit indicates that an arithmetic operation overflow in the ALU
OVM (Overflow Mode) bit ‘This bit enables/disables the accumulator overflow sat
ALU.
INTY (nterript Mode) bit This bit globally masks or enables all interrupts. The INTM bit has no ef
fect on the non-maskable RS and NMI interrupts.
DP (Data Memory Page Pointer) bits ‘These bits specify the address of the current data memory page.
The DP bits are concatenated with the 7 LSBs of an instruction word to form a direct memory address,
of 16 bits
jon mode in the
Fig.3.2(b) Status regier | (ST!) bit assignment
ARB Auxiliary Register Buffer
This 3-bit field holds the previous valu: contained in the ARP in STO, Whenever the ARP is loaded, the
previous ARP value is copied to the ARB, except when using the LST #0 instruction. When the ARB
is londed using the LST #1 instruction, the sare value is also copied to the ARP. This is useful when
restoring context (when not using the automatic context save) in a subroutine that modifies the current
ARP.
CNF On-chip RAM configuration control bit ‘This 1-bit field enables the on-chip cual-access RAM
block 0 (DARAM BO) to be addressable in data memory space oF program memory space. The CNF bit
ean be modified by the LST #1 instruction, If CNF is @, the on-chip DARAM block 0 is manped into
data memory space. The CNF bit can be cleared by a reset or the CLRC CNF instruction, When CNF is
1, the on-chip DARAM block 0 is mapped into program memory space. The CNF bit can be set by the
SETC CNF instruction,76 th Digkal Signal Processors
TC Testitontrol flag bit This I-bit lag stores the results of the ALL or parallel logic unit (PLU) test
bit operations. The status of the TC bit determines if the conditional branch, call and retura instructions
are to be executed.
SXM Sign-extension mode bit This I-bit field enablesidisables sign extension of an arithmetic opera-
tion, The SXM bit does not affect the operations of certain arithmetic or logical instructions: the ADDC,
ADDS, SUBB or SUBS instruction suppresses sign extension, regardless of SXM.
CCarrybit This 1-bit field indicates an arithmet
bit shift and rotate instructions affect the C bit.
¢ operation carry or borrow in the ALU. The single-
HM Hold mode bit This 1-bit Held determines whether the central processing unit (CPU) stops or
continues exeeution when acknowledging an active HOLD signal.
XE pin status bit This I-bit field determines the level of the external flag (XF) output pin
PM Product shift mode bits This 2-bit field determines the product shifter (P-SCALER) mode and
shift value for the PREG output into the ALU. Table 3.2 gives the PM bits and the function performed
Table 3.2. PM bits and the funcion performed
PM bits Funetion
bi bo P-SCALER mode for PREG output
oo Noshint
a1 Lefishifted 1 bit; ESB zerosfited
Lo Lef-shifted 4 bits; 4 LSBs zero-fled
Mi Right-shifed 6 bits; sign extended: 6 LSBs lost. The product is always sin extended, regard
loss ofthe vals of he SEM hit
ON-CHIP MEMORY. 3.13
The’CSX architecture contains a considersble amount of on-chip memory to aid in system performance
and integration:
Program Read-Only Memory (ROM)
Dat/Program Dual-Aecess RAM (DARAM)
Data/Program Single-Access RAM (SARAM)
The 'C5X has a total address range of 224K words x 16 bits. The memory space is divided into
four individually selectable memory segments: 64K-word program memory space, 64K-word local data
memory space, 64K-word VO ports and 32K-word global data memory space.
3.131 Program ROM
AIL'CS5X DSPs carry @ 16-bit on-chip maskable programmable ROM (see Fig. 3.1 far sizes). Some of
the ‘C5X DSPs have boot loader code resident in the on-chip ROM. and the other 'CSX DSPs offer
the boot loader code as an option. This memory is wsed for booting program code from slower external
ROM or EPROM to fast on-chip or extemal RAM. Once the custom program has been booted into
RAM, the boot ROM space can be removed from program memory space by setting the MP/ ME bit
in the processor mode status register (PMST). The of-chip ROM is selected at reset by driving the
MP/ MC pin low. Ifthe on-chip ROM is not selected, the ‘CSX devices start execution from off-chipArchitecture of TS320C5x Th 77
3.13.2. Data/Program Dual-Access RAM
AML'CSX DSPs cary a 1086-word x L6:it on-chip dulaccess RAM (DARAM), The DARAM is
divided int tree individually selectable memory Boek 12-vord data or program DARAM Hock
30, 512-word dais DARAM block DI and 32-worddata DARAM block B2, The DARAM pinay
lead eure dal valet tat vos posed cab eo tare posratiyw wal, DARAM blocks
B1 and B2 are always configured as data memory; however. DARAM block BO can be configured by
senvareia One poem aala
DARAM improves the operational sped of he “CSX CPU. The CPU operates with a4-dooppiplin.
In th ipeline, the CPU reads data on the third stage and writes data on the fourth stage. Hence, for
‘4 given instruction sequence, the second instruction could be reading data at the same time the first
vidngdata The dua da bce (DD and DAB) allow the CPU to red fom and wri o
ARAM inthe sme machin evle
instuction
3.13.3 Data/Program Single-Access RAM
Almost all ‘SX DSPs carry a 16-bit on-chip single-secess RAM (SARAM) of sizes varying from
1-9K (16-bits) words, Code ean be booted from an off-chip ROM and then executed at fall speed once
it is loaded into the on-chip SARAM. The SARAM can be configured by software as data memory, as
program memory or combination of both gata memory and program memory. The SARAM is divided
into IK- andior 2K-word blocks contiguous in address memory space. All 'CSX CPUs support parallel
accessesto these SARAM blocks, However, one SARAM block can be accessed cnly once per machine
cycle. In other words, the CPU can read from or write t0 o1¢ SARAM block while accessing another
SARAM block
3.13.4 On-Chip Memory Protection
The ‘C5X DSPs have a maskable option that protects the contents of on-chip memories. When the
related bit is set, no externally originating instruction can agvess the on-chip memiory spaces
ON-CHIP PERIPHERALS 3.14
AIL’CSX DSPs have the seme CPU structure; however, they have different on-chip peripherals connected
to their CPUs. The *C5X DSP on-chip peripherals available are as follows
Clock Generator
Hardware Timer
Software-Programmable WaitState Generators
Parallel /O Por
Host Port Interface (HPD)
Serial Port
Buffered Serial Port (BSP)
Time-Division Multiplexed (TDM) Serial Port
User-Maskable Interrupts,
3.141 Clock Generator
The clock generator consists of an internal oscillator and a phaselocked loop (PLL) circuit. The clock
‘generator can be driven internally by a crystal resonator circuit or driven externally by a clock source.78. th Dighal Signa Processors
clor
‘The PLL circuit can generate an intemal CPU clock by multiplying the clock source by a specific f
and so a clock source with a frequency lower than that of the CPU can be used,
3.44.2. Hardware Timer
A 16-bit hardware timer with a 4-bit prescaler is available, This programmable timer clocks at a rate
thatis between 1/2 and 1/32 of the machine cycle rate (CLKOUT!), depending upon the timer’s divide-
down ratio. The timer ean be stopped, restarted, reset or disabled by specifi status bits. Three registers
control and operate the timer, The timer counter register (TIM) gives the current count of the timer. The
timer period register (PRD) defines the petiod for the timer, The 16-bit timer control register (TCR)
controls the operations of the timer.
3.143. Software-Programmable Wait-State Generators
Sofiware-programmable wait-state logic is incorporated in *CSX DSPs allowing wait-state generation
‘without any external hashware for interlacing with slower off-chip memory end VO devices. This
feature consists of multiple wait-state generating circuits, Each circuit is userprogrammable to operate
in different wait states for off-chip memory accesses.
3.14.4 Parallel 1/0 Ports
A total of 64K LO ports are available, 16 of these ports are memory-mapped in data memory space
Each of the 10 ports can be addressed by the IN or the OUT instruction. The memory-mapped I/O ports
can be accessed with any instruction that reads from or writes to data memory. The 1S signal indicates
a read of write operation through an 1/0 port. The ‘C5X can easily interface with external }O devices
through the VO ports while requiring minimal off-chip address decoding cireuits.
3.4.5 Host Port Interface (HPI)
The HP1 is available on the ’C57S and ‘L.CS7. It isan 8-bit parallel /O port that provides an interface
to ahhost processor. Information is exchanged between the DSP and the host processor through on-chip
memory that is accessible to both the host processor and the *CS7.
3.44.6 Serial Port
Tiree different kinds of serial ports are available: a general-purpose serial port, a time-ivision
‘multiplexed (TDM) serial por: anda buffered serial port (BSP). Each ‘CSX contains atleast one general-
purpose, high-speed synchronous, full-duplexed serial port interlace that provides direst communication
with serial devioes such as codes, serial analog-to-digital (A/D) converters and other serial systems,
The serial port is capable of oporating at up to oxe-fousth the machine eyele rate (CLKOUT I), The
serial port transmitter and receiver are double-bulTered and individually controlled by maskable external
interrup signals. Daa is framed cither as bytes or as words.
Five 16-bit registes (SPC, DRR, DXR, XSR, RSR) contol and operate the serial port interface, The
serial port control (SPC) register contains the mode contro! and staus bits of the serial port. The data
reesive register (DRR) holds the incoming serial data, and the data transmit register (DXR) hols the
outgoing serial Gata, ‘The data transmit shift register (XSR) controls the shifting of the data from the
DXR to the output pin, The data receive shift register (RSR) controls the storing of the data from the
input pin to the DRArchitecture of tws320C5x th 79
3.14.7 Buffered Serial Port (BSP)
‘The BSP is available on the (C56 and ’C57 devices. Iti a full-duplexed, double-buftered serial port and
an antobuffering unit (ABU). The BSP provides flexibility on the date stream length. The ABU supports
high-speed data tansfer and reduces interrupt latencies. The BSP has a 2K-word buffet, which resides
in the *CSX intemal memory. Five BSP registers control and operate the BSP
3.14.8 TDM Serial Port
‘The TDM serial port available on the “C50, °C51 and *C53 devices isa full-duplexed serial port that ean
be configured by software either for synchronous operations or for time-division multiplexed operations,
‘The TDM serial port is commonty used in multiprocessor applications,
3.14.9 User-Maskable Interrupts
Four external interrupt lines (INTT ~ INT) ang five intemal interrupts, a timer interrupt and four serial
port interrupys are user maskable, When an interrupt service routine (ISR) is executed, the contents
ff the progrm counter are seved on an S-level hardware stack, and the contents of 11 specific CPU
registers, ACC, ACB, PREG, STO, ST1, PMST, TREGO, TREG!, TREG2, INDX and ARCR, are
saved! in one deep stack (shadow registers), When a return from interrupt instruction is executed, the
CPU registers’ contents are restored
Re
3A Mention few epplications of eachof the families of
TIDSPs
3.2 What are the different buses of TMSI20C5X and
their functions?
33 List the functional units in CALL of SX and explain
the source and destination of operands of each cf these
3.4 Listthe various registers used with the ARAU and
theirfunctions.
3.5. What Is meant by memory mapped register? How
isitcifferent fom a memory?
w Questions |-—-_—_-——________—
45.6 List status register bits of SX and thelr funetions.
3.7 Distinguish between the duskaccess 8AM and
single access RAM used inthe on-chip memory of &.
BB List the on-chip peripherals in SX and their
functions
3.9. What are the various interrupt types supported by
sx
210 Drawthe internal architecture diagram of SX and
Indicate the various blocs.
Self Test Questions {| ——_——__
3A The 320C5X DSPs are said to have ava
architecture because
(a) they have separate memory bus structures for
program and data
(b)they have instructions that enable data transter
betiveen the program and data memory area
(©) they have seme memory bus structures for program
and data
(a) the contents of program memory canrot into the
dato memory orvice verse
Harvard
3.2. The central ALL of COX DSP processors have
bit ALU and one ofthe operand’ for the ALU operation
comes from ——~.
(2)32,ACC ()IGACC (@)32,ACCR (a) 6ACCE
33. The cewit of operations performed in central ALU
ae stoted in —
(ACC — (BACB (@)TREGO (a) PREG
34 The ALU register whose cither higher order word
Cr lower order word can be loaded from memory is.80 tt Digta! Signal Processors
(a)acc (D)ACCB —(c) TREGO (d) PREG
35 The ——— bit register used for temporary storage
of scumulatoris
(a) 32, PREG (b) 32, ACCB (c) 16, TREGO, (a) 32, ACC
3.6 The ——— permits execution of ogi operations
on data without atecing the comtentsof ACC
6) parallel lope unt () alory ALU
(6) centel ALU
37 Trehardnoremuliplie unin the C5 processors
perform multiplication of times ——— bit
represented in ——— complement forn,
(1G 16.15 @)BEIE (2.16 16,25 (AB, 8,28
38 holds the resut of multiplication and is
—— bitwide,
(PREG, 32
(6) TREGO, 16(4) TREGO, 32
3.9 The register in which the multiplicand is stored
(prec, 16
bofore mukipliation is performed iz and ie
bit wide.
(@) PREG, 32 (e) PREG, 16
(6) TREGO, 16(¢) TREGO, 32,
3.10 ——— permits the contents of memory tobe let
shifted by O-16 bits bore they are either fed to ALU oF
stored from ALU te memory.
(2) Scaling shitter (aw
(nu (Auaiary ALU
M1 The regter that species the sumber of bits by
which the sealing shiftershould shift ether he incoming
data to one ofthe CPU reghters or vce versa fs ———
and is ——— bt wide
(@)TREGI4 (>) TREGT,5 (c) TREG2,5 (d) TREG2, 4
3.2 When the incoming dete to CPU islet shifted by
the salrg shifter the LSBs are filed with
(OG) OLSBbefre shitting
343. The bit of status regter ST, which determines
whether the MSBS of the bits lft shifted by the scaling
Shifters zero, are sign extended fr ———
Gsar GTC OV. OVM
3.14 Inthe hardware stack of 8K processors
rumbers can be stored,
GI OIE —|RR — AIG
BAS. The it of status register 0 (STO) that becomes if
overflow accurs from an ALU operation is
GSK OY OV HTC
—bit
we
B16 The bit of STO that determineswhether the ACC is,
replaced with eter largest postive or negative number
orlett unmodified is
SKN (GOV (OV (A)TC (OC
3.7 The bit of ST7 that is used for testing whether 2
Particular memory is zero oF nat or for comparing one
reaister against enother register memory is ———
GISKM GOV |OVM IC |C
218 The Lit of ST1 that becomes 1 if ether addition
generates @ carty or subiraction results in borrow is
(SKM OV ovM TC |C
319. The status register bit that determines whether
‘multiplier’ 32-bit product ile shifted by 0,1,4 or ight,
shifted by 6with sign extension before itis transferred
added to the ACC is —
(em @)CNF
(e)INTM
3.20. The RAM configuration contro bit that indicates
whether the on-chip reconfigurable duataccess RAM Is
‘mapped to data space or program space i
um @xr
GPM ()CNF KM id) XF
Ge)INT
3.21 The bit of satus regster that determines whether
the processor halts the internal operation while
acknowledging ahold or not is
GPM” (CNF HM (a) XP
(eINIM
3.22. The ——— bit of the status register indicates the
status of the general purpose output pin.
GPM (NF KM (a) xF
(hINT
3.23 The pointers that are contained in the status
register O are
GARE (OP LARD. A)UPTR
(INT
3.24 The pointers that are contained in the status
register are —
(ARP GOP = |ARB—G)IPTR
(INT
3.25. f ——— it is set to 0. all unmasked interrupts
ate enabled, Otherwise all the mastable interrupts are
disabled
(parr
(e)INTM
()DP ARB) IPTR