Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
161 views24 pages

Unit 5

Digital Signal Processing architecture
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
161 views24 pages

Unit 5

Digital Signal Processing architecture
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 24
INTRODUCTION TO PROGRAMMABLE DSPs The programmable digital signal processors (P-DSPs) are designed with features that are specifically required for digital sigral processing applications, The conventional microprocessors are meant for general purpose applications and hence they do not have these features. However, an advanced tnieroprocessor or a RISC processor may use some of the techniques adopted in P-DSPs or may even have instructions that are specifieally required for DSP applications. They may have performances close to that of a P-DSP for certain operations, For example, the DEC Alpha 21064 computes a 1024 point complex FET in 480 Us, as compared to the Analog device ADSP 21050 that takes about 450 is to carry oul the same operation. However in termsof low power requirement, cos, real time VO eapability and availability of high speed on-chip memories, the P-DSPs have an advantage over the advanced microprocessors and the RISC processors. In this chapter some of the features specifically required for performing cigital signal processing operations efficienly are discussed in detail MULTIPLIER AND MULTIPLIER ACCUMULATOR (MAC) 24 ‘One of the most common operations required in digital signal processing applications is. array multiplication, Forexample, convolution and correlation require array multiplication, In Chapter I, itwas shown how the array multiplication can be done using a single multiplier and adder. The implementation scheme is reproduced in Fig, 2.1. One of the important requirements of these array multipliers is that they have to proves the signals in real time. Before the next sample of the input signal arrives at the input to the array, the array multiplication should be completed. This requires the mutiplication as well as accumulation to be carried out using hardware elements. There are two approaches to solve this problem. A dedicated MAC unit may be implemented in hardware, which integrates multiplier and accumulator in a single hardware unit, This approach is adopied by the Motorola DSP processor DSPS600X. The other approach isto have multistier and accumulator separate, For example, inthe Texas Instruments DSP processor, 320CSX, the outpat o the multiplier is stored into the product register. The content of this in tutn eas be added to accumulator Fig-21_ Implementation ef conolver with single register ACC in the central ALU. In both of the rouliplerlodder 58 Digital sanal Processors above approaches. the MAC operation can be completed in one clock cycle. The presence of H/W ‘multipliers and/or multiplier accumulator is one of the mandatory requirements of a P-DSP. In Fig. 2.1, y,. the ouiput at the nth sampling instant, is obtaines by multiplying the array x, = [y Pa aa 2%] Somenponding othe present andthe pst M1 samples ofthe np with the array h= [fff lh, ty, fh, ,] Corresponding io the impulse response sequence. To obtain vy the input signal array x, is mapas with the array h. The vector x, ., is obiained by shifting the array x, towards tight so that the (v + 1)" sample of the input datas, ,, becomes the first clement andall the lementsof x, are shifted towards right by I position so that the * sloment of x, becomes the (/+1)* clement of x, , Instead of shifting the elements of x, towards right all ata time dfter finishing the vector multiplication, each of he elements may be shifted separately soon after the MAC operation that uses these elements is over. For example, afer obtaining the product &,, , the element x,_ May be made to be equal t0.x,_,,.,. Similarly, after obtaining the product, ,, the element ay be made equal to x,_ y,. ad so on is achieved in P-DSP by using a special instaiction called MACD multiply accumulate with date shift, For example, TMS320CSX has the instruction MACD pia, dina, which multiplies the content of the program memory pma with the content of the data memory with address dma and stores the result in the product regiser. The content of product register is aideé to the accumulator before the new product i stored. Further, the content of dma is copied to the next lecation whose address is dma + 1 MODIFIED BUS STRUCTURES AND MEMORY ACCESS SCHEMESINP-DSPs_ 2.2 It may be noted that the MAC operation with daia move (i.e. the MACD instruction) requires four memory accesses perinstruction cycle. (An instruction eyele isthe time that elapses since an instruction is fetched till the particular instruction completes execution including the timetakea for writing the result into.a register or memory: Many of the insiructions in P-DSPs including the MACD instruction require only one processor clock period/instruction eyele. In the conventional microprocessors one instruction eyele corresponds to several clock periods.) The four memory accesses/clock period required for the MACD instructions are as follows LL Reich the MACD instruction from the program memory Fetch one of the operands from the program memory Fete the second operand from the data memory Write the contert of the data memory with address dma into the location with the address dma ‘The ‘elatively static impulse response coefficients are stored in the program memory and the samples of the input date are stored in the data memory. If the MACD instruction is to be executed in a machine with Fon Newmann architecture, it requires four clock eyeles. This is because in the Von Neumann Resuts_f architecture shown in Fig. 2.2 there is a single address bus and a single data bus for accessing the program as, ‘well as data memory area. One of the ways by which the number of clock cycles required for the memory access can be reduced is use more than one bus for both |Cenvotwn address and data. For example in the Harvard architecture shown in Fig, 2.3, there are two separate buses for the Fig. 2.2. Von Neumann architecture ‘Operands ||P lopcose Introduction to Programmable oses tt 59 Faeieng], Resumoperands_[—oam Fog] emtvonsanss| Sam [eo coals [mer [cc's BSE Han Fig.23 Horandarchiecare Fig. 24 Modified Hervardorchtectre program and dats memory. Hence the content of program memory and data memory ean be accessed in parallel. The instnuction cade ean be fed from the program memory to the control unit while the operand is fed to the processing unit from the éata memory. The processing unit consisting of the registers and processing elements such as MAC units, multiplier, ALU, shifter, ete., are also referred to as data path, The P-DSPs follow the modified Harvard architecture shown in Fig. 2.4. One set of bus is used to acvess a memory that has both program and data and another that has data alone, Data can also be transferred from one memory to another. The modified Harvard architecture is used in several P-DSPs, for example P-DSPs fiom Tenas Tnstruments and Analog deviees. With the Harvard architecture, the number of memory accessesielock cyele was shown to be two. This can be increased futher by using more number of buses. For example, by using three separate address and data buses, the number of memory accesses‘elock eyele can be increased to three, Motorola DSPS6ODX, DSP96002, ete. have three separate buses. TMS320CS4X has four address buses. Since the cost of an IC increases with the number ofpins inthe IC, extendinga number of buses outside the chip would unduly increase the price. Hence the P-DSP’s use multiple buses only for connecting the on-chip memory to the control unit and data path, For accessing off-chip memory only a single bus is used for avcessing both the prozram memory and data memory. Because of this. any operation that involves an off-chip memory is slow compared to that using the on-chip memory. MULTIPLE ACCESS MEMORY 2.3 The number of memory accesses/clock period can also be increased by using a high speed memory that permits more than one memory aecess/clock period. For exumple, the DARAM, the dual access RAM, Permits two memory accessiclock period, Multiple access RAM may be connected to the processing unit of the P-DSP by using the Harvard architecture, For example DARAM connected to @ P-DSP with ‘ovo independent data and address buses can be used to achieve four memory aceessey elock period. MULTIPORTED MEMORY 24 Another technique that is adopted for increasing the number of accessesiclack perind isto use multiport- ed memory. For example the dual port memory has two independent data and address buses as shown in Fig, 2.5 and hence two memory accesses ean be achieved in a clock period. Multiparted memories dis- al ort pense with the need for storing the program and data in two different “S82 memory chips in order to permit simultaneous access to both program and data memory. However, one of the major limitations of the dual ported memory is the inerease in the cost compared to two single port Fig.2.5 lock diagram ofa dusiported memory 60 tt Digta signal Processers memory of the same lotal capacity. This is because of the inereased number of pins and larger chip area required for the dualported memory. Larger number of 1/0 pins require a larger and more expensive package and a larger die size Some P-DSPs combine the modified Harvard architecture with the dualported memories. For example, the Motorola DSP 561XX processors have a singleposted program memory and a dualported data memory, Hence one program memory access and two data memory accesses can be achieved per clock period VLIW ARCHITECTURE 25) Another architecture used for P-DSPs, for example in TMS320C6X, is the very long instruction word (VLIW) architecture. These P-DSPs have a number of processing units (data prths). In other words, they have a number of ALUs, MAC units, shifters, et. The VLIW is ed from memory and is used to specify the operands and operations to be performed by each of the data paths. As shown in Fig, 2.6, the multiple functional units share a common multiported register file for fetching the operands and storing the results, Parallel random access by the functional Units to the register file is facilitated by the read/write cross bar Execution of the operations in the Functional units is earried out concurrently with the load/store operation of data between a RAM and the register file ‘The performance gains that ean be achieved with VLIW architecture dependson the desree of parallelism inthe algorithm LETT ‘Reatirecrost bar selected for a DSP application and the number of functional tet yit units, The throughput will be higher only if the algorithm, reson |—|Purton involves execution of independent operations. For example, in Fig. 2.1, by using eight functional units, the time required for convolution ean be reduced by a factor of 8 compared to the cease whore a single finetional unit is used, However, it may not always be possible to have independent stream of data for processing, Further the number of functional units is also limited by the hardware cost for the muliponted register file and cross bar switch [Tf Insructon ceche Fig. 2.6 Block diagram ofthe VW architecture PIPELINING 2.6 One of the approaches adopted for increasing the efficiency of the advanced microprocessors as well as P-DS?s is instruction pipelining, An instruction cycle starting with the fetching of an instruction and ending with the execution of the instruction including the time storage of the results ean be split into @ ‘number of mieroinstractions. Execution of each of the mieroinstructions is also referred to as one phase of an instruction. For example, an instruction eyele requiring four microinstructions can be said to be in four phases as follows: 1. Fetch phase in which the instruction is fetched from the program memory 2. Decode phase in which the instruction is decoded 3. Memory read phase in wich the operand required for the execution of the instruction may be read from the data memory Introduction to Programmable oses th 61 4. Execution phase in which execution as well as the storage of the results in either one of the registers or memory is carried out Each of the above microinstructions may be carried out separately by four functional units. Let us, assume that each of the above four phases take equal time for completion. In this ease in aconventional microprocessor with no pipelining, each of the functional units is busy only 25% of the time. This is because only one instruction is processed at the CPU at a time, Figure 2.7 shows when each of the funetional unit is busy when a program containing three instructions I1, 12, 13 is exeouted. TF [Fa aaa rae aaa [Fae Baas [Read 1 a | 3 it { z 7 { z ea t a} = z ape z = i = se € 2 | oes ep E | zo ee Pe a |e] fe u fete te $ 5 | ote [7 | i a 1 n eT fT at “1 is 2 1 [=] Le 1 Ts Fig. 2.7 Insructon cycles of processor wih no pipelining Fig, 2.8 Insructon cycles of processor with pipelining The functional units can be kept busy almost all the time by processing a number of instructions simultaneously in the CPU, For example, in a machine with four functional units, four instructions 11, 12, B and 14 can be processed simultaneously as shown in Fig. 2.8, When Il enters the decode phase 2 can enter the opcode fetch phase. When I1 enters the operand read phase [2 enters the decode phase and (3 enters the opcode fetch phase. When II enters the execute phase 12 enters the operand read plase 1B enters the decode phase and I4 enters the opeod fetch phase. The pipeline is fully loaded now and all the functional units have useful work to do. The instructions that follow [4 keep the functional units busy till the program is exited. Let T denote the time required for each phase of the instruction, One clock eycle of the processor corresponds to T. Ina period of 127 only three instructions can be executed in a machine without pipelining, In the same period nine instructions ean be executed as shown in Fig 2.8. Hence the throughput is inereased by a factor of 3 in this ease. Itmay be noted that the initial latency of a machine with four phases is 47° Hence for executing a program with W instructions, the time required for execution is (N'+ 4)7.. Witha non-pipelined machine, the time required for executing V instructions is ANT. Instruction pipeline shown in Fig. 2.8 corresponds toa highly optimistic ease. Inthe case of processors requiring single clock cyele for execution for each of the instructions in the program, the throughput shown in Fig. 2.8 can be achieved, This is normally achieved with restricted instruction set computers (RISC), However in complex instruction set computers (CISC), there are also instructions with multiple word requiring multiple clock cycles for execution. In this case all the functional units cannot be kept busy all the time. For example, in the ease of call and branch instructions of « P-DSP, four phases or 7 states are required for the call’branch instruction to exit execution phase. By that time two more single word instructions or one double instruction enters the instruction pipeline. These instructions should not be executed. Hence two words have to be flushed aut ofthe instruction pipeline before the insiructions are fetched starting from the new program address. 62 th Digtatsignai Processors ‘To overcome this problem. some of the P-DSPs have special branch/call and return instructions called as delayed branchicalliretum instructions. When the delayed branch instruction is executed, the program branches tothe new program address only afterthe two I -word instructions orthe single 2-word instruction following the branch instruction are completely executed. Similarly, when the delayed eall instruction is executed, the program calls to the subroutine only afier the two I-word instructions or the single 2-word instruction following the call insiruction are completely executed. When the delayed callbranch return instructions are executed, there is no need for flushing the pipeline and maximum throughput is obtained. Examples of pipeline operation of delayed as well as undelayed branchieall instructions are given in Chapter 4. The throughput eificiency of the pipeline may also be reduced because of conflicts between the instructions in the instruction pipeline in different phases, This happens if the same memory is used to store the data and program and there is only a single address bus for addressing both the program and data memory. This is truein the case of off-chip memory. For example, an instruction in fetch phase may try to fetch the instruction code from a memory chip that is also accessed by another instruction that s in the operand read phase. To avoid the conflict, the operand read phase will be done first and the opcode fetch phase will be repeated tll there is no conllict again. ‘The number of instructions that are processed simultaneously in the CPU, also referred to as depth of the instruction pipeline, differs in different families of P-DSPs. The pipeline depths of some of the P-DSPs are given in Table 2.1 Table 2.1 Instruction pipeline depth of some PDSPs P-DSP Namolfamily Pipeline Depth ‘Analog devices z Matorols DSPS6O0X 3 TITMSs20c3x 4 ELIMS 320054x 6 SPECIAL ADDRESSING MODES IN P-DSPS a3 In addition to theaddressing modes such as direct, indirect and immediate supported by the conventional microprocessors, P-DSPs have special addressing modes that permit single word/instruction format and thereby speed up the execution by making effective use of the instruction pipelining, Further there are also special addressing modes such as cyclic addressing and bit reversed addressing that are specifically tailored for DSP applications. The details of these addressing are presented next, 2.7.1 Short Immediate Addressing This permitsthe operand to bespecified using a short constant that forms part ofa single word instruction. The length of the short constant depends on the instruction type and the P-DSP. For example in the case of TL TMS320C3X, an 8-bit eonstant can be specified as one of the operands in the single word instructions for addition, subtraction, AND, OR, XOR, ete. Introduction to Programmable oses th 63 2.7.2 Short Direct Addressing This permits the lower order address of the operand ofan instruction to be specified in the single word instruction, In the TI TMS320 DSPs, the higher order 9 bits of the memory are stored in the data page pointer and only the lower 7 bits are specified as & part of the instruction, Bach contiguous block of 128 words is referred to as one page in the TI DSPs, The argument in the instruction specifies only the location within the current page, In the Motorola DSPS400X, short direct memory addressing permits a 6-bil address to be specified in the instruction. 2.73 Memory-mapped Addressing The CPU registers and the V/O registers of the P-DSPs are also accessible as memory location. This is achieved by storing them in either the starting page or the final page of the memory space. For example, in TMS320C5X. page 0 corresponds to the CPU registers and 1/0 registers. In the case of Moterola DSPS600X, the last page of the memery space containing 64 locations is used as the memory map for the CPU and 1/0 registers. When these registers are accessed using memory mapped addressing mades, the higher address bits are not taken from the data page pointer and instead made to be O in the case of TIDSPs and made to be 1 in Motorola DSPs. 2.7.4 Indirect Addressing In P-DSPs this addressing mode has a number of options. This permits an array of data to be processed in P-DSP to be efficiently fetched and stored. The address of the operands can be stored in one of the registers called indirect address registers. In the case of TI processors the indirect adress registers are called anviliary registers ARS. Any of these registers can bo updated when the eperand fetched using these registers are being executed. This is made possible by having an additional ALU in the CPU core specifically for the indirect address registers or ARs. The ARs may be incremented or decremented cither in steps of I or in steps specified by the content of an offset register. Inthe ease of TI processors, the offset resister is called an [NY register. In the P-DSPS from analog devices itis called the modifier register: The content ofthe indirect address registers may also be updated by a constant using bitreversed addressing mode explained in the next section. In the TI SX processors the new address computed by the auxiliary ALU is not used for fetching the operand for the current instruction that is being decoded and is executed. It is used for fetching the operand that uses the indirect addressing mode next with this particular AR. For this reason, the indirect addressing mode used in TI SX P-DSPs is called indivect addressing mode with pos-inerementéeerement, In Motorola DSPS63XX, tho updated indirect address register content may also be used to fetch the operand for the current instruction, Hence this mode is called the indirect addressing mode with pre-increment/ decrement. In TI TMS320CS4X_ processors both post-incremenvdecrement and pre-incrernenVdee-rement operations are supported. 2.7.5 Bit Reversed Addressing Mode The bit reversed number representation is explained in Section 1.14. The binary pattern corresponding to aparticular decimal number is obtained by writing the natural binary equivalent of the number in the reverse order so that the most significant bit ofthe natural binary number becomes the least significant bit of the bit reversed no and vice vers For the computation of the FFT, the data is 10 be arranged in the bit reversed order and 2-point DFT of the resulting sequence is to be computed first, Inthe bit reversed addressing mode, when a 16-point FETis to be computed, 2-point DFT of X(0) and X() isto be found. Similary 2-point DFT of X(4) and 64 tt Digta! Signal Processors (12) and so on, It may be noted from Table 1. that the value 0, 8, 4, 12 corresponds to the consecutive ‘numbers in the bit reversed number representation. In the bit reversed addressing mode, the address is incremented decremented by ihe number represented in the bit reversed form. 2.7.6 Circular Addressing In real time processing of signals, the input signal is continuously stored in the memory. The processed data is stored in another memory space continuously and may be writen onto the output deviee. In this case input as well as output program will be simple, However, since the input as well as output memory space will be finite in size, the entire memory space would be exhausted after processing the {input signal for some time, if the data is written into the memory by using linear addressing mode. One ‘way to evereome this problem is to keep checking whether the range of either the input or the output memory space is exeveded. In that case, the new data is to be stored starting from the beginning of the particular memory space. However, checking this condition is an overhead that can be overcome using the circular addressing mode, In this mode, the memory can be orgenised as a circular buffer with the beginning memory address and the ending memory address corresponding to this buffer defined by the programmer. In the circular addressing mode, when the address pointer is incremente¢, the address will bbe checked with the ending memory address of the circular buffer, ITit exceeds that, the address will be made equal o the beginning address of the eircular bu ON-CHIP PERIPHERALS 2.8 ‘The P-DSPs have a number of on-chip peripherals that relieve the CPU from routine functions. Further they also help to reduce the chip count on the DSP system based around P-DSP. Some of the on-chip peripherals in the P-DSPs and their functions are as Follows, 2.81 On-chipTimer Two of the common applications of the timers are generation of periodic interrupts to the P-DSPs and generation of the sampling clocks for the A/D converters. The timer mede can be programmed by the P-DSPs, The timers can generate a single pulse or a periodic train of pulses. They can also generate & single square wave or a periodic square wave. The period of the timer is also made programmable 2.8.2. Serial Port This enables the data communication between the P-DSP and an external peripheral such as A/D converter, D/A converter or an RS232.C device. These ports nonmally have input and output butlers so thatthe P_DSP writes or reads from the serial port ia parallel form and the serial port sends and reecives data to the peripherals in serial form. They also generate interrapts when the serial port output baffer is empty or the input buffer is full, These devices bave parallel 0 serial and serial to parallel converter inbuilt into them, The shift clock ean be fed either from the P-DSP oF an external devive ean supply it, The serial ports ean operate cither in the asynchronous mode or in the synchronous mode. In the asynchronous mode, the transmit data and receive data lines alone are used for communication and bit clock is transmitted from either end. In the case of synchronous moée, both bit clock and a frame syne signal that indicatos the beginning of the first bit of the data transmitted using synchronous mode is transmitted from the serial port to the 1/0 device and also from LO port to the serial port. Example, of the two signals with respect to the transmitted data is shown in Fig. 2.9. Introduction to Programmable osPs tt 65 UU cuKR copes se L368 pao INT ! (\_____ FSR Recave fame se (LKR: Recah lock DR: Receive date Fig. 2.9 Burst mode serial port receive operation 2.83 TDM Serial Port The P-DSPs have a special serial port called TDM serial port, This permits a P-DSP to communicate with other devices or P-DSPs by using time division multiplexing (TDM). One of the devices can _generate the frame syne pulse that indicates the beginning of a TDM frame and bit clock, the duration for which « bit isto be transmitted, As shown in Pig 2.10 the TDM frame is split into o number of equal slots and each slot can be allotted for one af the deviet cnt | erz | cna | ena | ons | cre | car | one Fig. 2.10 TDM frome with & time slots For example, in Fig. 2.10, there are 8 slots/frame and is referred to as a TDM with eight channel. In cach of the slots, a number of bits may be transmitted by a channel, The TDM serial port normally uses four lines for the purpose of serial communication. They are TERM. the frame syne signal TClock: the bit clock TADD: The address of the serial device that is outputting data in a particular TDM slot TAT: The data transmitted into the TDM channel by the authorised device ‘The signals TADD and TDAT are bidirectional and are tristate controlled so that only one of the devices transmit the data and address in these lines at a time. Any one of the devices can generate the TERM and clock signals and they are used by the other devices as a reference. A scheme where eight devices ate interconnected using the TDM serial port is shown in Fig, 2.11. An example of how TI ‘TMS320CSX can be configured to be one of the devices is shown in Fig. 2.12. An example, of each of the devices outputting a 16-bit data (D15 - DO) in its slot and also the address ofthe deviee (A0-A15), which is supposed to receive this data is shown in Fig. 2.13. 66 11 Digital Signal Processors ewe [ vee: 1 ta Fig.2.11 Incerconnecing 8 devices using TOM serial using 4.bPbus OX <4 tat osx TrSR |» Tao roux eu |» I. > roux Fig.2.12TMS320C5X confgured tobe one of TDM devies vax UU ror (om Yo1e) {en 05) (or om) — v0 XS Tam) nay a) — 13. Data transfer using TDM charnel 2.8.4 Parallel Port Parallel ports enable communication between the P-DSP and other devices to be faster compared to the serial communication by using a number of lines in parallel. In addition, they also have additional lines, which are for strobing or for handshaking purposes. The P-DSPs kave two approaches for assigning lines for parallel port. In one approach used by the TI, the data bus itself is used for parallel ports. This is achieved by allocating a specific address space for VO ane whenever this address space is addressed tusing the UO instructions, the parallel por signals including the handshaking signals are sent over the data bus. In another approach, separate lines are dedicated for parallel ports including the handshaking signals, 2.85 Bit I/O Ports The P-DSPs have additional VO ports that are single bit wide. These port bits may be individually set, reset or read. These bits are normally used for eontrel pusposes but they ean also be used fordata transfer. There are no handshaking signals for these V/O ports. Some of these bits are also used for conditional branching or calls. For example, in TI processors there are instructions such as branch if 10 zero, Introduction to Programmable oses tt 67 2.8.6 Host Port ‘The P-DSPs also have a special parallel port normally 8-bit or 16-bit wide called the host port that enables them to communicate with a microprocessor or PC, which is called as. host. In addition to data ‘communieation, the host can generate interrupts and alse cause the P-DSP to load a program from ROM. to the RAM on reset, Almost all the P-DSPs including the ones from Analog devices, Motorola and TI have host ports 2.8.7 Comm Ports These are parallel ports that are used for interprocess communication between a number of identical P-DSP in a multiprocessor system. For example, a multiprocessor system may be built using 9 number of TTMS320C4X. For the purpose of communication of the data between these processors six comm ports each of width 8 bits is provided. Since the data to be processed may be 32 or more number of bits. the P-DSPs have provision for splitting the data in streams of 8 bits and also assemble the 8 bits into words of 32 bits, Analog devices DSP ADSP 2]06X has 6 comm ports each of which is 4 bits wide. 2.8.8 On-Chip A/D and D/A Converters Some of the P-DSPs targeted towards voice applications such as cellular telephones and tapeless answering machines have A/D and D/A converters inside the P-DSP. For example, Motorola DSP 561XX and Analog devices ADSP 21MSPSX both have the A/D and D/A on chip ané permit effective sampling rates of about 8 KHz, 2.8.9 P-DSPswith RISC and CISC P-DSPs may be implemented using cither the RISC processor or the CISC processor. For example, TI TMS320C6X P-DSPs uses RISC processor and a large number of P-DSPs from Analog devices, Motorola and TI make use of CISC, For example, TI TM$32054X and the Motorola DSPS63XX and analog devices ADSP 2100X make use of CISC. TI TMS320C8X has ¢ RISC and four P-DSPs with CISC ina single core. The relative advantages of each of these processors are as follows: RISC Advantages The chip area dedicated to the realisation of the control unit is considerably reduced because of the reduced numer of instructions. About 20% ofthe chip area may be used for the control unit in RISC. In CISC processors about 30 - 40% ofthe chip area is used up for the control unit. Therefore in a RISC there is more area available for incorporating other features, ‘As a result of considerable reduction in the control area, the CPU registers and the data paths (processing units) can be replicated and the throughput of the processor ean be increased by applying pipelining and perallel processing Ina RISC, all the instructions are of uniform length and take the same time forexeoution. Hence the ‘dummy pericdls or hold periods in she instruction pipeline is reduced 10 the minimum, This inereases the computational speed A simpler and smaller control unit in RISC has fewer gates. This reduces the propagation delay and increases the speed. Reduced number of instructions, formats and addressing modes result in simpler and smaller decoder, whieh, in tur, increase the speed. In RISC processors, the delayed branch and call instructions can be effectively used and they improve the speed. 68 th Digital signa Processors HLL support Writing the programs in C and C+ relieves the programmer from leaning the instruc. tion set of a P-DSP and instead concentrate on the application, It increases the taroughpitt of the pro- grammer. Since RISC has a smaller number of instructions, the compiler for any HLL is shorter and simpler. The availability of a relatively large number of CPU registers permits a more efficient code optimisation by maximising the use of CPU registers over slower memories. CISC Advantages Some of the advantages of RISC also turn out to be disadvantages when viewed from a different perspective. The CISC processors have a very rich instruction set that even support high level language constructs similar to “if condition true then do”, “for” and “while. The P-DSPs with CISC also have instructions specifically required for DSP applications such as MACD. FIRS, ete. This makes the application program written in the assembly language to be shorter and easy to follow. Since RISC has smaller number of instructions, implementation of a single CISC instruction might require a number of instructions in RISC. This increases the memory requized for storing the program and the taffie between CPU and memory is increased. This is on the one hand inereases the computation time and on the other hand makes the program difficult to debi ‘The HLL compilers are costly by several onders of magnitude compared to the P-DSPs themselves. For P-DSP with RISC architecture, compilers are essential. For most of the low cost applications, DSP platforms without the compilers are preferred. Henee a majority of P-DSPs are CISC based. The P-DSP ‘manufacturers have tried to keep the codes for the new processors upward compatible with the older processors. This makes the learning curve steeper The relative disadvantages of each of these architectures are diminishing, By making the RISC processors applicable forlarger and larger applications, the cost of the chip per se and the compiler costs are being brought down. The HLL compilers for the CISC processors ars also becoming as efficient as hand assembly and the costs are coming down, Hence the distinction between the two in terms of cost and debugging efficiency is likely to narrow down further. The code composer studio from Tl permits the programming in HLL as well as assembly language in a single development environment so that the best features of both the HLL and assembly language programming can be used by the programmer. Review Questions \_-—-2-————_. 2A Explain why & MAC operation Is implemented in 2.6 Explain the different techniques adopted for in hardware in programmable DSPs. 2.2. Explain how convolution is performed using a single MAC unit, 2.3. Explain the differance between a MAC instruction and MAC with data shift instruction. When is the later Instruction preferred? 2.4 Explain the difference between Von Neumann ‘and Harvaed architecture for the computer. Which architecture is preferred for DSF applleations and why? 25 Explain why the P-DSPs have mulkiple address ‘and data buses for internal memory and peripherals but have only @ single address are data bus forthe external memory and peripheral? creasing the number of memory accesses/instruction oye. 2.1 Explain how a higher throughput 1s obtained using the VLIW architecture. Give an example, of a DSP that hes VLNW architecture 2.8 Explain what Is mesnt by instruction pipelining Eyplain with an example, how pipelining increases the throughput efficiency. 29° Explain how delayee branchall instructions are superier tothe undelayed branchvcall instructions. 2.10 txplain the memory mapped addressing mode used In P-DSPs, 241 What are the different waysin which the operand forinstructionscanbe specified using indirect adkressing mode: 242 What is meant by bit reversed addressing mode? \ihat is the applieation for which this addressing mode is preferred? 2.43 What is meant by circular adcressing mode? What is the application for which this addressing mode Is pre ferret? 214 Mention some applications of on chip timer in PDs Self Test Questions 21 The features in which PDSP is superior to advanced microprocessors is (2) Low cost (6) Low power () Computaticnal speed (d) Real time /Ocapat 22 In modified Harvard architecture for letching the content of program and data memony a separste bus 's used for ——— memory and a single bus is used for === menory ty 23. Number of memoryaccessestcloct/period that ean be achieved using on chia DARAM of aP-DSP is — @ 2 3 ws 24 VLiiV architecture differs from conventional P-DS° inwhich ofthe following aspects: (@) Instruction cache (b) Number of functional units (0) Use pipelining Gd) Adingle word fetched from menory has a numberof instructions 25 A P-DSP has four pipeline stages and uses four phase clack. The number of clock cycles vequired for texecutinga program with 25 instruction is ——— (29 (2825S Introduction to Programmable DSPs Tt 225 Distinguish between the synchronous asynchronous mede of operation of serial ports 2.16 Explain the operation of TOM serial porss In P-DsPs, 2AT Whats the use of host ports in P-DSPS How do ‘hey difer from tre comm ports? 2.48 List the relative ments and demerits of RISC and CIC processors. 69 and |--—_—_——_- 2.6 The number of instruction cycles required for ‘executing @ program in a microprocessor with no pipelining is——, @ 2 © Wa 24 The addressing mode that is convenient for FFT ‘computation ie——— (@) indirect addressing (b) Circular mode (©)BIt reversed addressing (a) nemery mapped 2.8 The aclreccng that permits the conantin internal register of the CPU & 1/0 to be accessed as memory location is—— (@) indirect addressing _(b) Circular mode (Bit reversed addressing (d) Memory mapped 2.9 The sevial port that permits thedatafroma number (of WO devices to be sent using a single serial port is called——— (@)Comm port (6) Host port (6) Time division multiplexing (4) Bit VO port 230 _Which ofthe fllowing characteristics are true for a RISC processor” (2) Smaller contra unit (6) Small instruction ot ()Short program length (d)Less traffic between CPU & memory ARCHITECTURE OF TMS3Z005X INTRODUCTION al Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog Devices and Motorola manufacture the digital signal processor (DSP) chips. These manufacturers have developed 4 range of DSP chips with varied complexity. The underlying concepts are broadly the same. Some of these concepts are discussed in Chapter 2. In order to give a feel for the design of systems with DSP chips, in this chapter, some details on the design of systems using the TMS320C3X DSP chip (denoted in brieFas 5X ) manufactured by TL are given The TM$320 DSP family consists of two types of single-chip DSPs: 16-bit fixed-point and 32-bit floating-point. These DSPs possess the operational fiexibility of high-speed controllers and the numerical capability of array processors. Combining these two qualities, the TMS320 processors are inexpensive altematives to custom fabricated VLSI and multichip bit-slice processors, TMS320CSX belongs to the fifth generation of the TI’s TMS320 family of DSPs. The first five generations of TMS320 family are CIX, C2X, C3X, C4X and CSX. The CIX, C2X, C2XX and CSX are 16-bit fixed-point processors Instruction sets of the higher generation fixed-point processors are upward compatible to the lower generation fixed-point processors. For example CSX can execute the instructions of beth CLX and C2X The 54X is upward compatible with SX. C3X and C4X are 32-bit floating-point processors and C4X is upward compatible with C3X instruction set. The siath generation C6X devices feature VelociT1!™: an advanced very long instruction word (VLIW) archit by TI and can execute 1600 MIPS. The eighth generation C8X devices, have, on a single piece of silicon, a number of advanced DSPs (ADSPs) and a RISC master processor. Typical application of the above families of TI DSP are as follows: CIX, C2X, C2XX, CSX, C54X: toys, hard disk drives, modems, cellular phones and active ear suspensions, C3X: filters, analysers, hisi systems, voice mail, imaging, barcode readers, motor control, 3D aaraphics or sciemtfie processing CAX: _parallel-processing clusters in virtual reality, image recognition telecom routing, and parallel- processing systems. COX; wireless base stations, pooled modems, remote-access servers, digital subscriber loop systems, cable modems and multichannel telephone systems ture develo Architecture of tws320c8x. tt 71 C8X: video telephony. 3D compater graphies. virtual wality and a number of multimedia applications The TI DSP chips have IC numbers with the prefix TMS320, Ifthe next letter is C (e.g. TMS320C5X), it indicates that CMOS technology is used for the IC and the on-chip non-volatile memory is a ROM, Ifitis E (e.g. TMS320E5X) itindicates thatthe technology used is CMOS and the on-chip non-volatile memory is an EPROM. If itis neither (e.g. TMS3205X), it indicates that NMOS technology is used for the IC and the on-chip not-volatile memory is a ROM. Under C3X itself there are three processors, +€50, *C51 and “C5X. that have identical instruction set but have differences in the capacity of on-chip ROM and RAM. The characteristics of some of the TMS320 family DSP chips are given in Table 3.1 The instruction set of TMS320CSX and other DSP chips is superior to the instruction set of cconxentional microprocessors suci as 8085, Z80, ete, 28 most of the instructions require only a single cycle for execution, The multiply accumulate operation used quite frequently in signal processing applications such as convolution requires only one cycle in DSP. Table 2.1 Characteristics of some of the TMS320 family DSP chips "C25 "C30 "C50 "C541 Cycle time (ns) 200 toa oO 2 on chip RAM. aK 4K 4K 2k 3k. Total memory 4K 128K 1M 128K 138K Parallel ports 8 16 1M 6K 6K Architecture of TMS320C5X DSPs _The block diagram of the internal architecture of CSX is shown in Fig. 3.1. The 320C5X DSPs are said to have advanced Harvard architecture because they have sepa- rate memory bus structures for programm and data and have instructions that enable data transfer between the program and data memory area BUS STRUCTURE 3.2 Seperate program and data buses allow simultaneous access to program instructions and dota. providing a high degree of parallelism. For example, while data is multiplied, a previous product can be loaded into, added to or subtracted from the accumulator and, atthe same time, a new address can be generated. Such parallelism supports a powerful set of arithmetic, logic and bit-manipulation operations that can all be performed in a single machine cycle. In addition, the ‘C5X includes the control mechanisms to manage interrupts, repeated operations and function calling. The ’C5X architecture has four buses and their functions are as follows: Program bus (PB) It cerries the instruction code and immediate operands from program memory space to the CPU. Program address bus (PAB) It provides addresses to program memory space for both reads and writes, Dato reed bus (DB) t interconnects various elements of the CPU to data memory space. Dato reed address bus (DAB) It provides the address to access the data memory space. The program and data buses can work together to transfer data from on-chip date memory and internal or program memory to the multiplier for single-cyele multiplyiaccumulate operatiors. «2 You have either reached a page that is unavailable for viewing or reached your viewing limit for this book. Architecture of TMS320C5x Th 73 multiplier unit in the CSX processors performs 16 x 16 multiplication of numbers represented in 2's complement form. The 32-bit PREG holds the result of mukiplication, The 16-bit temporary register 0 (TREGO) holds the muttiplicand. The other operand for the multiplication can be specified using one of the addressing modes. (0-16-bit left barrel shifter and right barrel shifter in CALU pert the contents of memory to be left shifted by 0 10 16 bits before they are either fed to ALU or stored from ALU to memery. The CPU registers ACC and PREG can alse be shifled using these shifters. In this ease they require Owo eyeles. A 5-bit register TREGI specifies the number of bits by which the scaling shitter should shift cither the incoming data to one of the CPU registers or vice versa. When the incoming data to CPU is left shifted by the scaling shifter the LSBs are filled with 0. ‘AUXILIARY REGISTER ALU (ARAU) 34 It consists of eight 16-bit auxiliary registers (ARs) ARO-AR7, a 3-bit auxiliary register pointer (ARP) and an unsigned 16-bit ALU. ARAU calculates indirect addresses by using inputs from ARs, 16-bit index register (INDX) and auxiliary register compare register (ARCR), The ARAU can autoindex the current AR while the data memory location is being addressed and can index either by + 1 or by the contents of the INDX. As a result, accessing data does not require the CALU for address manipulation: therefore. the CALU is free for other operations in parallel. This makes the instructions to be executed faster ‘compared to the conventional microprocessors. For example, let us consider the following sequence of 8085 instructions: MOVA.M INXH These insiructions enable the accumulator to be leaded using indircet addressing mode and HL register used as the address pointer is incremented. These two instructions can be replaced by a single SX instruction LACC *+, 0. Further, any one of the auxiliary registers can be used as the address pointer and incremented by the above instruction, The register that will be used is specified by the content of the ARP. The auxiliary registers ARO-AR7 may also be used as the general purpose registers for holding the ‘operands forarithmetic and logical operations in CALU, Some of the other registers of ARAU and their Functions are as follows: INDEX REGISTER (INDX) 3.5 ‘The 16-bit INDX is used by the ARAU as a step value (addition or subtraction by more than 1) to modify the address in the ARs during indirect addressing. For example, when the ARAU steps across a row of a matrix, the indirect address is incremented by 1. However, when the ARAU steps down a column, the address is incremented by the dimension of the matrix. The ARAU ean add or subtract the value stored in the INDX from the current AR as part of the indirect address operation. INDX can also map the dimension of the address block used for bit-reversal addressing AUXILIARY REGISTER COMPARE REGISTER (ARCR) 3.6 ‘The 16-bit ARCR is used for address boundary comparison. The CMPR instruction compares the ARCR to the selected AR and places the result of the compate in the TC bit of STI 74 Tt Digital Signal Processors BLOCK MOVE ADDRESS REGISTER (BMAR) 37 The 16-bit BMAR holds an address value to be used with block moves and multiply/accumulate ‘operations. This register provides the 16-bit address foran indirect-addressed second operand, BLOCK REPEAT REGISTERS (RPTC, BRCR, PASR, PAER) 3.8 All these registers are 16-bit wide. Repeal counter register (RPTC) holds the repeat count in a repeat single-instruction operation and is loaded by the RPT and RPTZ instructions. Block repeat counter ster (BRCR) holds the count value for the block repeat feature, This value is loaded before a block repeat operation is initiated, Block repeat program address start register (PASR) indicates the 16-bit address where the repeated block of cade starts. The block repeat program address end register (PAER) indicates the 16-bit address where the repeated block of code ends. The PASR and PAER are loaded by the RPTB instruction, PARALLEL LOGIC UNIT(PLU) 3.9. It performs Boolean operations oF the bit manipulations requited of high-speed controllers. The PLU can sel, clear, test or toggle bits in a status register control register, or any data memory location. The PLU allows logic operations to be performed on data memory values directly without affecting the contents of the ACC or PREG. Results ofa PLU finetion are written baek to the original data memory location. MEMORY-MAPPED REGISTERS 3.10 ‘The *CSX has 96 registers mapped into page 0 of the data memory space. AII“CSX DSPs have 28 CPU registers and 16 input/output (1/0) port registers but have different numbers of peripheral and reserved registers. Since the memory-mapped registers are x component of the data memory space, they can he written to and read from in the same way as any other data memory location, The memory-mapped registers are used for indirect data address pointers, temporary storage, CPU status and control, or integer arithmetic processing through the ARAU. PROGRAM CONTROLLER 3.11 The program controller contains logic circuitry that decodes the instructions, manages the CPU pipeline stores te status of CPU operationsand decades the conditional operations, Parallelism of architecture lets the C5X perform three concurrent memory operations in any given machine eyele: fetch an instruction, read an operand and write an operand. The program controller consists of the following elements: 16-bit program counter (PC) 16-bit status registers STO, ST1, processor mode status register (PMST) and circular buffer contro! register (CBCR) (8 x 16)-bit hardware stack Adaiess generation logic Instruction register Interrupt flag register and interrupt mask registor Architecture of TwS320¢5X th 75 a Fig. 3.2(a) Status register 0 (STO) bit assignment ‘SOME FLAGS INTHE STATUS REGISTERS 3.12 The status registers ean be stored into data memory ancl loaded fiom data memory, thereby allowing the *C5X status to be saved and restored for subroutines. The STO and STI each have an associated 1-level deep shadow register stack for automatic context-saving when an interrupt trap is taken. These registers are automatically restored upon a retum from interrupt. ‘The bit assignment details for STO and ST1 are given in Fig, 3.2. Significance of the various bits of ST0and STI areas follows: ARP (Auniliary Register Pointer) ‘These bits select the AR to be used in indirect addressing. When the ARP is loaded, the previous ARP value is copied to the auxiliary register buffer (ARB) in ST1 OV (Overflow) flag bit This bit indicates that an arithmetic operation overflow in the ALU OVM (Overflow Mode) bit ‘This bit enables/disables the accumulator overflow sat ALU. INTY (nterript Mode) bit This bit globally masks or enables all interrupts. The INTM bit has no ef fect on the non-maskable RS and NMI interrupts. DP (Data Memory Page Pointer) bits ‘These bits specify the address of the current data memory page. The DP bits are concatenated with the 7 LSBs of an instruction word to form a direct memory address, of 16 bits jon mode in the Fig.3.2(b) Status regier | (ST!) bit assignment ARB Auxiliary Register Buffer This 3-bit field holds the previous valu: contained in the ARP in STO, Whenever the ARP is loaded, the previous ARP value is copied to the ARB, except when using the LST #0 instruction. When the ARB is londed using the LST #1 instruction, the sare value is also copied to the ARP. This is useful when restoring context (when not using the automatic context save) in a subroutine that modifies the current ARP. CNF On-chip RAM configuration control bit ‘This 1-bit field enables the on-chip cual-access RAM block 0 (DARAM BO) to be addressable in data memory space oF program memory space. The CNF bit ean be modified by the LST #1 instruction, If CNF is @, the on-chip DARAM block 0 is manped into data memory space. The CNF bit can be cleared by a reset or the CLRC CNF instruction, When CNF is 1, the on-chip DARAM block 0 is mapped into program memory space. The CNF bit can be set by the SETC CNF instruction, 76 th Digkal Signal Processors TC Testitontrol flag bit This I-bit lag stores the results of the ALL or parallel logic unit (PLU) test bit operations. The status of the TC bit determines if the conditional branch, call and retura instructions are to be executed. SXM Sign-extension mode bit This I-bit field enablesidisables sign extension of an arithmetic opera- tion, The SXM bit does not affect the operations of certain arithmetic or logical instructions: the ADDC, ADDS, SUBB or SUBS instruction suppresses sign extension, regardless of SXM. CCarrybit This 1-bit field indicates an arithmet bit shift and rotate instructions affect the C bit. ¢ operation carry or borrow in the ALU. The single- HM Hold mode bit This 1-bit Held determines whether the central processing unit (CPU) stops or continues exeeution when acknowledging an active HOLD signal. XE pin status bit This I-bit field determines the level of the external flag (XF) output pin PM Product shift mode bits This 2-bit field determines the product shifter (P-SCALER) mode and shift value for the PREG output into the ALU. Table 3.2 gives the PM bits and the function performed Table 3.2. PM bits and the funcion performed PM bits Funetion bi bo P-SCALER mode for PREG output oo Noshint a1 Lefishifted 1 bit; ESB zerosfited Lo Lef-shifted 4 bits; 4 LSBs zero-fled Mi Right-shifed 6 bits; sign extended: 6 LSBs lost. The product is always sin extended, regard loss ofthe vals of he SEM hit ON-CHIP MEMORY. 3.13 The’CSX architecture contains a considersble amount of on-chip memory to aid in system performance and integration: Program Read-Only Memory (ROM) Dat/Program Dual-Aecess RAM (DARAM) Data/Program Single-Access RAM (SARAM) The 'C5X has a total address range of 224K words x 16 bits. The memory space is divided into four individually selectable memory segments: 64K-word program memory space, 64K-word local data memory space, 64K-word VO ports and 32K-word global data memory space. 3.131 Program ROM AIL'CS5X DSPs carry @ 16-bit on-chip maskable programmable ROM (see Fig. 3.1 far sizes). Some of the ‘C5X DSPs have boot loader code resident in the on-chip ROM. and the other 'CSX DSPs offer the boot loader code as an option. This memory is wsed for booting program code from slower external ROM or EPROM to fast on-chip or extemal RAM. Once the custom program has been booted into RAM, the boot ROM space can be removed from program memory space by setting the MP/ ME bit in the processor mode status register (PMST). The of-chip ROM is selected at reset by driving the MP/ MC pin low. Ifthe on-chip ROM is not selected, the ‘CSX devices start execution from off-chip Architecture of TS320C5x Th 77 3.13.2. Data/Program Dual-Access RAM AML'CSX DSPs cary a 1086-word x L6:it on-chip dulaccess RAM (DARAM), The DARAM is divided int tree individually selectable memory Boek 12-vord data or program DARAM Hock 30, 512-word dais DARAM block DI and 32-worddata DARAM block B2, The DARAM pinay lead eure dal valet tat vos posed cab eo tare posratiyw wal, DARAM blocks B1 and B2 are always configured as data memory; however. DARAM block BO can be configured by senvareia One poem aala DARAM improves the operational sped of he “CSX CPU. The CPU operates with a4-dooppiplin. In th ipeline, the CPU reads data on the third stage and writes data on the fourth stage. Hence, for ‘4 given instruction sequence, the second instruction could be reading data at the same time the first vidngdata The dua da bce (DD and DAB) allow the CPU to red fom and wri o ARAM inthe sme machin evle instuction 3.13.3 Data/Program Single-Access RAM Almost all ‘SX DSPs carry a 16-bit on-chip single-secess RAM (SARAM) of sizes varying from 1-9K (16-bits) words, Code ean be booted from an off-chip ROM and then executed at fall speed once it is loaded into the on-chip SARAM. The SARAM can be configured by software as data memory, as program memory or combination of both gata memory and program memory. The SARAM is divided into IK- andior 2K-word blocks contiguous in address memory space. All 'CSX CPUs support parallel accessesto these SARAM blocks, However, one SARAM block can be accessed cnly once per machine cycle. In other words, the CPU can read from or write t0 o1¢ SARAM block while accessing another SARAM block 3.13.4 On-Chip Memory Protection The ‘C5X DSPs have a maskable option that protects the contents of on-chip memories. When the related bit is set, no externally originating instruction can agvess the on-chip memiory spaces ON-CHIP PERIPHERALS 3.14 AIL’CSX DSPs have the seme CPU structure; however, they have different on-chip peripherals connected to their CPUs. The *C5X DSP on-chip peripherals available are as follows Clock Generator Hardware Timer Software-Programmable WaitState Generators Parallel /O Por Host Port Interface (HPD) Serial Port Buffered Serial Port (BSP) Time-Division Multiplexed (TDM) Serial Port User-Maskable Interrupts, 3.141 Clock Generator The clock generator consists of an internal oscillator and a phaselocked loop (PLL) circuit. The clock ‘generator can be driven internally by a crystal resonator circuit or driven externally by a clock source. 78. th Dighal Signa Processors clor ‘The PLL circuit can generate an intemal CPU clock by multiplying the clock source by a specific f and so a clock source with a frequency lower than that of the CPU can be used, 3.44.2. Hardware Timer A 16-bit hardware timer with a 4-bit prescaler is available, This programmable timer clocks at a rate thatis between 1/2 and 1/32 of the machine cycle rate (CLKOUT!), depending upon the timer’s divide- down ratio. The timer ean be stopped, restarted, reset or disabled by specifi status bits. Three registers control and operate the timer, The timer counter register (TIM) gives the current count of the timer. The timer period register (PRD) defines the petiod for the timer, The 16-bit timer control register (TCR) controls the operations of the timer. 3.143. Software-Programmable Wait-State Generators Sofiware-programmable wait-state logic is incorporated in *CSX DSPs allowing wait-state generation ‘without any external hashware for interlacing with slower off-chip memory end VO devices. This feature consists of multiple wait-state generating circuits, Each circuit is userprogrammable to operate in different wait states for off-chip memory accesses. 3.14.4 Parallel 1/0 Ports A total of 64K LO ports are available, 16 of these ports are memory-mapped in data memory space Each of the 10 ports can be addressed by the IN or the OUT instruction. The memory-mapped I/O ports can be accessed with any instruction that reads from or writes to data memory. The 1S signal indicates a read of write operation through an 1/0 port. The ‘C5X can easily interface with external }O devices through the VO ports while requiring minimal off-chip address decoding cireuits. 3.4.5 Host Port Interface (HPI) The HP1 is available on the ’C57S and ‘L.CS7. It isan 8-bit parallel /O port that provides an interface to ahhost processor. Information is exchanged between the DSP and the host processor through on-chip memory that is accessible to both the host processor and the *CS7. 3.44.6 Serial Port Tiree different kinds of serial ports are available: a general-purpose serial port, a time-ivision ‘multiplexed (TDM) serial por: anda buffered serial port (BSP). Each ‘CSX contains atleast one general- purpose, high-speed synchronous, full-duplexed serial port interlace that provides direst communication with serial devioes such as codes, serial analog-to-digital (A/D) converters and other serial systems, The serial port is capable of oporating at up to oxe-fousth the machine eyele rate (CLKOUT I), The serial port transmitter and receiver are double-bulTered and individually controlled by maskable external interrup signals. Daa is framed cither as bytes or as words. Five 16-bit registes (SPC, DRR, DXR, XSR, RSR) contol and operate the serial port interface, The serial port control (SPC) register contains the mode contro! and staus bits of the serial port. The data reesive register (DRR) holds the incoming serial data, and the data transmit register (DXR) hols the outgoing serial Gata, ‘The data transmit shift register (XSR) controls the shifting of the data from the DXR to the output pin, The data receive shift register (RSR) controls the storing of the data from the input pin to the DR Architecture of tws320C5x th 79 3.14.7 Buffered Serial Port (BSP) ‘The BSP is available on the (C56 and ’C57 devices. Iti a full-duplexed, double-buftered serial port and an antobuffering unit (ABU). The BSP provides flexibility on the date stream length. The ABU supports high-speed data tansfer and reduces interrupt latencies. The BSP has a 2K-word buffet, which resides in the *CSX intemal memory. Five BSP registers control and operate the BSP 3.14.8 TDM Serial Port ‘The TDM serial port available on the “C50, °C51 and *C53 devices isa full-duplexed serial port that ean be configured by software either for synchronous operations or for time-division multiplexed operations, ‘The TDM serial port is commonty used in multiprocessor applications, 3.14.9 User-Maskable Interrupts Four external interrupt lines (INTT ~ INT) ang five intemal interrupts, a timer interrupt and four serial port interrupys are user maskable, When an interrupt service routine (ISR) is executed, the contents ff the progrm counter are seved on an S-level hardware stack, and the contents of 11 specific CPU registers, ACC, ACB, PREG, STO, ST1, PMST, TREGO, TREG!, TREG2, INDX and ARCR, are saved! in one deep stack (shadow registers), When a return from interrupt instruction is executed, the CPU registers’ contents are restored Re 3A Mention few epplications of eachof the families of TIDSPs 3.2 What are the different buses of TMSI20C5X and their functions? 33 List the functional units in CALL of SX and explain the source and destination of operands of each cf these 3.4 Listthe various registers used with the ARAU and theirfunctions. 3.5. What Is meant by memory mapped register? How isitcifferent fom a memory? w Questions |-—-_—_-——________— 45.6 List status register bits of SX and thelr funetions. 3.7 Distinguish between the duskaccess 8AM and single access RAM used inthe on-chip memory of &. BB List the on-chip peripherals in SX and their functions 3.9. What are the various interrupt types supported by sx 210 Drawthe internal architecture diagram of SX and Indicate the various blocs. Self Test Questions {| ——_——__ 3A The 320C5X DSPs are said to have ava architecture because (a) they have separate memory bus structures for program and data (b)they have instructions that enable data transter betiveen the program and data memory area (©) they have seme memory bus structures for program and data (a) the contents of program memory canrot into the dato memory orvice verse Harvard 3.2. The central ALL of COX DSP processors have bit ALU and one ofthe operand’ for the ALU operation comes from ——~. (2)32,ACC ()IGACC (@)32,ACCR (a) 6ACCE 33. The cewit of operations performed in central ALU ae stoted in — (ACC — (BACB (@)TREGO (a) PREG 34 The ALU register whose cither higher order word Cr lower order word can be loaded from memory is. 80 tt Digta! Signal Processors (a)acc (D)ACCB —(c) TREGO (d) PREG 35 The ——— bit register used for temporary storage of scumulatoris (a) 32, PREG (b) 32, ACCB (c) 16, TREGO, (a) 32, ACC 3.6 The ——— permits execution of ogi operations on data without atecing the comtentsof ACC 6) parallel lope unt () alory ALU (6) centel ALU 37 Trehardnoremuliplie unin the C5 processors perform multiplication of times ——— bit represented in ——— complement forn, (1G 16.15 @)BEIE (2.16 16,25 (AB, 8,28 38 holds the resut of multiplication and is —— bitwide, (PREG, 32 (6) TREGO, 16(4) TREGO, 32 3.9 The register in which the multiplicand is stored (prec, 16 bofore mukipliation is performed iz and ie bit wide. (@) PREG, 32 (e) PREG, 16 (6) TREGO, 16(¢) TREGO, 32, 3.10 ——— permits the contents of memory tobe let shifted by O-16 bits bore they are either fed to ALU oF stored from ALU te memory. (2) Scaling shitter (aw (nu (Auaiary ALU M1 The regter that species the sumber of bits by which the sealing shiftershould shift ether he incoming data to one ofthe CPU reghters or vce versa fs ——— and is ——— bt wide (@)TREGI4 (>) TREGT,5 (c) TREG2,5 (d) TREG2, 4 3.2 When the incoming dete to CPU islet shifted by the salrg shifter the LSBs are filed with (OG) OLSBbefre shitting 343. The bit of status regter ST, which determines whether the MSBS of the bits lft shifted by the scaling Shifters zero, are sign extended fr ——— Gsar GTC OV. OVM 3.14 Inthe hardware stack of 8K processors rumbers can be stored, GI OIE —|RR — AIG BAS. The it of status register 0 (STO) that becomes if overflow accurs from an ALU operation is GSK OY OV HTC —bit we B16 The bit of STO that determineswhether the ACC is, replaced with eter largest postive or negative number orlett unmodified is SKN (GOV (OV (A)TC (OC 3.7 The bit of ST7 that is used for testing whether 2 Particular memory is zero oF nat or for comparing one reaister against enother register memory is ——— GISKM GOV |OVM IC |C 218 The Lit of ST1 that becomes 1 if ether addition generates @ carty or subiraction results in borrow is (SKM OV ovM TC |C 319. The status register bit that determines whether ‘multiplier’ 32-bit product ile shifted by 0,1,4 or ight, shifted by 6with sign extension before itis transferred added to the ACC is — (em @)CNF (e)INTM 3.20. The RAM configuration contro bit that indicates whether the on-chip reconfigurable duataccess RAM Is ‘mapped to data space or program space i um @xr GPM ()CNF KM id) XF Ge)INT 3.21 The bit of satus regster that determines whether the processor halts the internal operation while acknowledging ahold or not is GPM” (CNF HM (a) XP (eINIM 3.22. The ——— bit of the status register indicates the status of the general purpose output pin. GPM (NF KM (a) xF (hINT 3.23 The pointers that are contained in the status register O are GARE (OP LARD. A)UPTR (INT 3.24 The pointers that are contained in the status register are — (ARP GOP = |ARB—G)IPTR (INT 3.25. f ——— it is set to 0. all unmasked interrupts ate enabled, Otherwise all the mastable interrupts are disabled (parr (e)INTM ()DP ARB) IPTR

You might also like