Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
65 views111 pages

Unit 3.

Uploaded by

21egjcs021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views111 pages

Unit 3.

Uploaded by

21egjcs021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Computer Architecture and Organization

Unit-3
Central Processing Unit

Computer Science & Engineering Department

[email protected]
9829121431
 Outline
Looping
• General Register Organization
• Stack Organization
• Instruction format
• Addressing Modes
• Data transfer and manipulation
• Program Control
• Reduced Instruction Set Computer (RISC)
• Complex Instruction Set Computer (CISC)
• Questions asked in RTU exam
Clock Input

R1
R2
R3
R4
R5
R6
R7

Load
(7 lines) SEL A MUX - A MUX - B SEL B

3x8 A - bus B - bus


decoder

Arithmetic Logic Unit


OPR
SEL D (ALU)

Output
General Register Organization
 Example: R1 R2 + R3
 To perform the above operation, the control must provide binary selection variables to the
following selector inputs:
1. MUX A selector (SELA): to place the content of R2 into bus A.
2. MUX B selector (SELB): to place the content of R3 into bus B.
3. ALU operation selector (OPR): to provide the arithmetic addition A + B.
4. Decoder destination selector (SELD): to transfer the content of the output bus into R1.

 Control Word:

SELA SELB SELD OPR

Pradeep Jha 6CS4-04: Computer Architecture and Organization 5


General Register Organization
OPR Select Operation Symbol
00000 Transfer A TSFA
Binary
SELA SELB SELD 00001 Increment A INCA
Code
000 Input Input None 00010 A+B ADD
001 R1 R1 R1 00101 A–B SUB
010 R2 R2 R2 00110 Decrement A DECA
011 R3 R3 R3 01000 A and B AND
100 R4 R4 R4 01010 A or B OR
101 R5 R5 R5 01100 A xor B XOR
110 R6 R6 R6 01110 Complement A COMA
111 R7 R7 R7 10000 Shift right A SHRA
11000 Shift left A SHLA
Encoding of Register Selection Fields
Encoding of ALU Operations

Pradeep Jha 6CS4-04: Computer Architecture and Organization 6


General Register Organization (Numerical)
 A bus organized CPU has 16 register with 32 bits in each , an ALU and destination decoder
1. How many multiplexer are there in a bus and what is the size of each multiplexer
2. How many selection inputs are needed for MUX A and MUX B
3. How many inputs and outputs are there in the decoder
4. How many input and output are there in the ALU for data including input & output
5. Formulate a control word for the system assuming that the ALU has 35 operation

Pradeep Jha 6CS4-04: Computer Architecture and Organization 7


Stack Organization
 A stack is a storage device that stores information in such a manner that the item stored last is
the first item retrieved (LIFO).
 The register that holds the address for the stack is called a stack pointer (SP) because its value
always points at the top item in the stack.
 The physical registers of a stack are always available for reading or writing. It is the content of
the word that is inserted or deleted.
 There are two types of stack organization
1. Register stack – built using registers
2. Memory stack – logical part of memory allocated as stack

Pradeep Jha 6CS4-04: Computer Architecture and Organization 9


Register Stack
 PUSH Operation Address
63
SP ← SP + 1
FULL EMTY
M[SP] ← DR
IF (SP= 0) then (FULL ← 1)
EMTY ← 0

 POP Operation 4
SP C 3
DR ← M[SP] 2
B
SP ← SP - 1 A 1
0
IF (SP= 0) then (EMTY ← 1)
FULL ← 0 DR

Pradeep Jha 6CS4-04: Computer Architecture and Organization 10


Register Stack
 A stack can be placed in a portion of a large memory or it can be organized as a collection of a
finite number of memory words or registers. Figure shows the organization of a 64-word
register stack.
 The stack pointer register SP contains a binary number whose value is equal to the address of
the word that is currently on top of the stack.
 In a 64-word stack, the stack pointer contains 6 bits because 26 = 64.
 Since SP has only six bits, it cannot exceed a number greater than 63 (111111 in binary).
 The one-bit register FULL is set to 1 when the stack is full, and the one-bit register EMTY is set
to 1 when the stack is empty of items.
 DR is the data register that holds the binary data to be written into or read out of the stack.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 11


Memory Stack
Address
 PUSH Operation
PC Program 1000
SP ← SP - 1 (instructions)
AR Data 2000
M[SP] ← DR (operands)
3000
 POP Operation Stack

3997
DR ← M[SP]
SP 3998
SP ← SP + 1
3999
4000
4001

DR

Pradeep Jha 6CS4-04: Computer Architecture and Organization 12


Memory Stack
 The implementation of a stack in the CPU is done by assigning a portion of memory to a stack
operation and using a processor register as a stack pointer.
 Figure shows a portion of computer memory partitioned into three segments: program, data,
and stack.
 The program counter PC points at the address of the next instruction in the program which is
used during the fetch phase to read an instruction.
 The address registers AR points at an array of data which is used during the execute phase to
read an operand.
 The stack pointer SP points at the top of the stack which is used to push or pop items into or
from the stack.
 We assume that the items in the stack communicate with a data register DR.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 13


Reverse Polish Notation
 The common mathematical method of writing arithmetic expressions imposes difficulties when
evaluated by a computer.
 The Polish mathematician Lukasiewicz showed that arithmetic expressions can be represented
in prefix notation as well as postfix notation.

Infix Prefix or Polish Postfix or Reverse Polish


A+B + AB AB +

A*B+C*D AB * CD * +
Reverse Polish

Pradeep Jha 6CS4-04: Computer Architecture and Organization 14


Evaluation of Arithmetic Expression

(3 * 4) + (5 * 6) 34*56*+ 42

4 5 5 30

3 3 12 12 12 12 42

3 4 * 5 6 * +

Pradeep Jha 6CS4-04: Computer Architecture and Organization 15


Instruction Formats
 Definition : An instruction format usually symbolized by rectangular boxes defines the layout of
bits of the instruction according to their appearance in memory word or control registers.
 Instructions are categorized into different formats with respect to the operand fields in the
instructions.
 The number of address fields in the instruction format depends on the internal organization of
CPU.
 The three most common CPU organizations:(1)Single accumulator organization(2)General
register organization(3)Stack organization
 Computers may have instructions of several different lengths containing varying number of
addresses. Following are the types of instructions.
1. Three Address Instructions
2. Two Address Instruction
3. One Address Instruction
4. Zero Address Instruction
Pradeep Jha 6CS4-04: Computer Architecture and Organization 17
Three Address Instruction
 Computers with three-address instruction formats can use each address field to specify either
a processor register or a memory operand.
 The program in assembly language that evaluates X = (A + B) * (C + D) is shown below.
ADD R1, A, B R1← M[A]+ M[B]
ADD R2, C, D R2← M[C]+ M[D]
MUL X, R1, R2 M[X]← R1 * R2
 The advantage of three-address format is that it results in short programs when evaluating
arithmetic expressions.
 The disadvantage is that the binary-coded instructions require too many bits to specify three
addresses.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 18


Two Address Instruction
 Two address instructions are the most common in commercial computers. Here again each
address field can specify either a processor register or a memory word.
 The program to evaluate X = (A + B) * (C + D) is as follows:

MOV R1, A R1← M[A]


ADD R1, B R1← R1+ M[B]
MOV R2, C R2← M[C]
ADD R2, D R2← R2+ M[D]
MUL R1, R2 R1← R1 * R2
MOV X, R1 M[X]← R1

Pradeep Jha 6CS4-04: Computer Architecture and Organization 19


One Address Instruction
 One address instructions use an implied accumulator (AC) register for all data manipulation.
 For multiplication and division these is a need for a second register.
 However, here we will neglect the second register and assume that the AC contains the result
of all operations.
 The program to evaluate X = (A + B) * (C + D) is

LOAD A AC← M[A]


ADD B AC← AC+M[B]
STORE T M[T]←AC
LOAD C AC← M[C]
ADD D AC← AC+M[D]
MUL T AC← AC*M[T]
STORE X M[X]←AC

Pradeep Jha 6CS4-04: Computer Architecture and Organization 20


Zero Address Instruction
 A stack-organized computer does not use an address field for the instructions ADD and MUL.
 The PUSH and POP instructions, however, need an address field to specify the operand that
communicates with the stack.
 The program to evaluate X = (A + B) * (C + D) will be written for a stack-organized computer.
 To evaluate arithmetic expressions in a stack computer, it is necessary to convert the
expression into reverse polish notation.
PUSH A TOS← M[A]
PUSH B TOS← M[B]
ADD TOS←(A+B)
PUSH C TOS← M[C]
PUSH D TOS← M[D]
ADD TOS←(C+D)
MUL TOS←(C+D)*(A+B)
POP X M[X] ← TOS

Pradeep Jha 6CS4-04: Computer Architecture and Organization 21


RISC Instruction
 The instruction set of a typical RISC processor is restricted to the use of load and store
instructions when communicating between memory and CPU.
 All other instructions are executed within the registers of the CPU without referring to memory.
 A program for a RISC type CPU consists of LOAD and STORE instructions that have one
memory and one register address, and computational-type instructions that have three
addresses with all three specifying processor registers.
 The following is a program to evaluate X = (A + B) * (C + D)
LOAD R1, A R1← M[A]
LOAD R2, B R2← M[B]
LOAD R3, C R3← M[C]
LOAD R4, D R4← M[D]
ADD R1, R1, R2 R1← R1+R2
ADD R3, R3, R4 R3← R3+R4
MUL R1, R1, R3 R1← R1*R3
STORE X, R1 M[X] ← R1
Pradeep Jha 6CS4-04: Computer Architecture and Organization 22
Addressing Modes
 The addressing mode specifies a rule for Definition:
interpreting or modifying the address field of Execution of instruction require data and this
the instruction before the operand is actually data may be present in internal registers of
referenced. CPU or some location in memory . These are
 Computers use addressing mode techniques a lot of ways to specify the address of data
for the purpose of accommodating one or or operand these ways are called addressing
both of the following provisions: mode.
1. To give programming versatility to the user by
providing such facilities as pointers to memory,
Types Of Addressing Mode
counters for loop control, indexing of data, and Immediate
program relocation. Direct
2. To reduce the number of bits in the addressing Indirect
field of the instruction.
Register direct
 There are basic 07 addressing modes Register Indirect
supported by the computer. Auto increment and decrement
Displacement
Pradeep Jha 6CS4-04: Computer Architecture and Organization 24
Addressing Modes
 Some of the standard terms used in addressing modes are
 Address(A):-Content of an address field in the instruction that refers to a memory location
 Register(R):-Contents of an address field in the instruction that refers to a register
 Program Counter(PC):-The Program Counter(PC)keeps track of the instructions in the program
stored in memory . It holds the instruction to be execution next , and it increment each time an
instruction is fetched from memory
 Effective Address(EA):-The effective address is the address of the operand in a computational
type instruction .It is defined to be the memory address obtained from the computation
dictated by the given addressing mode

Pradeep Jha 6CS4-04: Computer Architecture and Organization 25


Immediate Addressing
 Operand is part of instruction
 Operand = address field
 e.g. ADD 5
 Add 5 to contents of accumulator Instruction

 5 is operand Opcode Operand


 No memory reference to fetch data
 Fast
 Limited range

Pradeep Jha 6CS4-04: Computer Architecture and Organization 26


Direct Addressing
 In this mode the effective address is equal to the address part of the instruction.
 Effective address (EA) = address field (A)
 The operand resides in memory and its address is given directly by the address field of the
instruction.
 Address field contains address of operand
 e.g. ADD A
 Add contents of cell A to accumulator
 Look in memory at address A for operand
 Single memory reference to access data
 No additional calculations to work out effective address
 Limited address space

Pradeep Jha 6CS4-04: Computer Architecture and Organization 27


Direct Addressing Diagram

Instruction

Opcode Address A
Memory

Operand

Pradeep Jha 6CS4-04: Computer Architecture and Organization 28


Indirect Addressing
 In this mode the address field of the instruction gives the address where the effective address is
stored in memory.
 Control fetches the instruction from memory and uses its address part to access memory again
to read the effective address.
 The effective address in this mode is obtained from the following computational:

Effective address = address part of instruction + memory cell point to operand

 Memory cell pointed to by address field contains the address of (pointer to) the operand
 EA = (A)
 Look in A, find address (A) and look there for operand
 e.g. ADD (A)
 Add contents of cell pointed to by contents of A to accumulator
Pradeep Jha 6CS4-04: Computer Architecture and Organization 29
Indirect Addressing Diagram

Opcode Address A
Memory

Pointer to operand

Operand

Pradeep Jha 6CS4-04: Computer Architecture and Organization 30


Register Addressing
 Register addressing is similar to direct addressing .The only difference is that the address field
refers to a register rather than a main memory address.
 Operand is held in register named in address filed
 Effective address (EA) = R
 The Advantage of register addressing are:-
1)Only a small addressing filed is needed in the instruction
2)No memory reference is required , Shorter instructions, Faster instruction fetch.
Disadvantage
 Limited number of registers .
 Address space is very limited

Pradeep Jha 6CS4-04: Computer Architecture and Organization 31


Register Addressing Diagram
Instruction

Opcode Register Address R


Registers

Operand

Pradeep Jha 6CS4-04: Computer Architecture and Organization 32


Register Indirect Addressing
 In this mode the instruction specifies a register in the CPU whose contents give the address of
the operand in memory.
 Before using a register indirect mode instruction, the programmer must ensure that the
memory address of the operand is placed in the processor register with a previous instruction.
 E.g. MOV [R1], R2
value of R2 is moved to the memory location specified in R1.
 Effective address (EA)=content of (R)=A
 Operand is in memory cell pointed to by contents of register R
 The advantage of this mode is that address field of the instruction uses fewer bits to select a
register than would have been required to specify a memory address directly.
 Large address space (2n)
 One fewer memory access than indirect addressing

Pradeep Jha 6CS4-04: Computer Architecture and Organization 33


Register Indirect Addressing Diagram
Instruction

Opcode Register Address R


Memory

Registers

Pointer to Operand Operand

Pradeep Jha 6CS4-04: Computer Architecture and Organization 34


Auto increment or Auto decrement Mode
 This is similar to the register indirect mode expect that the register is incremented or
decremented after (or before) its value is used to access memory.
 When the address stored in the register refers to a table of data in memory, it is necessary to
increment or decrement the register after every access to the table. This can be achieved by
using the increment or decrement instruction.
 Auto Increment
Effective address (EA)=A
Increment A after determined EA
 Auto decrement
Effective address (EA)=A-1
Before determined effective address decrement value of R

Pradeep Jha 6CS4-04: Computer Architecture and Organization 35


Displacement Addressing
 EA = A + (R)
 Address field hold two values
 A = base value
 R = register that holds displacement
 or vice versa

Pradeep Jha 6CS4-04: Computer Architecture and Organization 36


Displacement Addressing Diagram

Instruction

Opcode Register R Address A


Memory

Registers

Pointer to Operand + Operand

Pradeep Jha 6CS4-04: Computer Architecture and Organization 37


Base Register Addressing Mode
 In this mode the content of a base register is added to the address part of the instruction to
obtain the effective address.
 A base register is assumed to hold a base address and the address field of the instruction
gives a displacement relative to this base address.
 The base register addressing mode is used in computers to facilitate the relocation of
programs in memory.
Effective address = address part of instruction + content of base register

Pradeep Jha 6CS4-04: Computer Architecture and Organization 38


Relative Addressing
 In this mode the content of the program counter is added to the address part of the instruction
in order to obtain the effective address.
 The address part of the instruction is usually a signed number which can be either positive or
negative.

Effective address = address part of instruction + content of PC


 A version of displacement addressing
 R = Program counter, PC
 EA = A + (PC)
 i.e. get operand from A cells from current location pointed to by PC
 locality of reference & cache usage ,

Pradeep Jha 6CS4-04: Computer Architecture and Organization 39


Indexed Addressing Mode
 In this mode the content of an index register is added to the address part of the instruction to
obtain the effective address.
 The indexed register is a special CPU register that contain an index value.
 The address field of the instruction defines the beginning address of a data array in memory.
 Each operand in the array is stored in memory relative to the begging address.
Effective address = address part of instruction + content of index register

Pradeep Jha 6CS4-04: Computer Architecture and Organization 40


Addressing Modes (Numerical)
 A two word instruction is stored in memory at an address designated by symbol W the address
field of the instruction (stored at W+1) is designated by the symbol Y. The operand used during
the execution of the instruction is stored at an address symbolized by Z.an index register
contains the value X. State how Z is calculated form the other addresses if the addressing
mode of the instruction is
 Direct Addressing mode
 Indirect Addressing mode
 Relative Addressing mode
 Indexed Addressing mode

Pradeep Jha 6CS4-04: Computer Architecture and Organization 41


Addressing Modes (Numerical)
 Solution
 PC=W+2
W Opcode Mode
 Index Register =X
W+1 Address field= Y
 Effective Address=Z
W+2 Next instruction
Direct Addressing
EA=Value of address field
EA=Y
Indirect Addressing
EA=Z=Memory content of address filed
EA=M[Y] Z Operand
Relative Addressing
EA=Z=value of PC + Value of address field
EA=W+2+Y
Indexed Addressing
EA=Z=value of index register + Value of address field
EA=X+Y
Pradeep Jha 6CS4-04: Computer Architecture and Organization 42
Addressing Modes (Example)
Address Memory
200 Load to AC Mode
PC = 200
201 Address = 500
202 Next instruction
R1 = 400

XR = 100 399 450


Find Effective Address 400 700
1. Immediate
2. Direct 500 800
3. Indirect
4. Register direct
5. Register Indirect 600 900
6. Auto increment and
7. Auto decrement 702 325
8. Relative
9. Index
800 300
10. Base
Pradeep Jha 6CS4-04: Computer Architecture and Organization 43
Addressing Modes (Numerical)
• Immediate Addressing Mode • Auto increment and Addressing Mode
EA=201 EA=400
• Direct Addressing Mode • Auto decrement Addressing Mode
EA=500 EA=399
• Indirect Addressing Mode • Relative Addressing Mode
EA=800 EA=202+500=702
• Register direct Addressing Mode • Index Addressing Mode
EA=R1 EA=100+500=600
• Register Indirect Addressing Mode • Base Register Addressing Mode
EA=400 EA=100+500=600

Pradeep Jha 6CS4-04: Computer Architecture and Organization 44


Addressing Modes (Numerical)
 An Instruction is stored at location 300 with its address field at location 301.the address field
has the value 400.A processor register R1 contains the number 200.Evaluate the effective
address if the addressing mode of the instruction is Address Memory

1. Immediate 300 Opcode Mode


301 Address field= 400
2. Direct
302 Next instruction
3. Indirect
4. Register direct
5. Register Indirect
6. Auto increment and Auto decrement
400 Operand
7. Displacement (Relative , Index with R1 as the index register

Pradeep Jha 6CS4-04: Computer Architecture and Organization 45


Addressing Modes (Numerical)

Pradeep Jha 6CS4-04: Computer Architecture and Organization 46


Data transfer instructions
 Data transfer instructions move data from one place in the computer to another without
changing the data content.
 The most common transfers are between memory and processor registers, between processor
registers and input or output, and between the processor registers themselves.

Name Mnemonic The data manipulation instructions in a typical


Load LD computer are usually divided into three basic
Store ST types:
Move MOV 1. Arithmetic instructions
Exchange XCH 2. Logical and bit manipulation instructions
Input IN 3. Shift instructions
Output OUT
Push PUSH
Pop POP

Pradeep Jha 6CS4-04: Computer Architecture and Organization 48


Instructions
Logical & Bit
Arithmetic Instructions Manipulation Instructions Shift Instructions
Name Mnemonic Name Mnemonic Name Mnemonic
Increment INC Clear CLR Logical shift right SHR
Decrement DEC Complement COM Logical shift left SHL
Add ADD AND AND Arithmetic shift right SHRA
Subtract SUB OR OR Arithmetic shift left SHLA
Multiply MUL Exclusive-OR XOR Rotate right ROR
Divide DIV Clear carry CLRC Rotate left ROL
Add with carry ADDC Set carry SETC Rotate right through carry RORC
Subtract with borrow SUBB Complement COMC Rotate left through carry ROLC
Negate (2’s complement) NEG carry
Enable interrupt EI
Disable interrupt DI

Pradeep Jha 6CS4-04: Computer Architecture and Organization 49


Program Control
 A program control type of instruction, when executed, may change the address value in the
program counter and cause the flow of control to be altered.
 The change in value of the program counter as a result of the execution of a program control
instruction causes a break in the sequence of instruction execution.

Name Mnemonic
Branch BUN
Jump JMP
Skip SKP
Call CALL
Return RET
Compare (by subtraction) CMP
Test (by ANDing) TST

Pradeep Jha 6CS4-04: Computer Architecture and Organization 51


Status Bit Conditions
A B  Bit C (carry) is set to 1 if the end
8 8
carry C8 is 1. It is cleared to 0 if the
𝐶7
8-bit ALU carry is 0.
𝐶8  Bit S (sign) is set to 1 if the highest-
V Z S C order bit F7 is 1. It is set to 0 if the bit
𝐹7 − 𝐹0 is 0.
 Bit Z (zero) is set to 1 if the output is
𝐹7 zero and Z = 0 if the output is not
zero.
 Bit V (overflow) is set to 1 if the
exclusive-OR of the last two carries
Check for zero output
is equal to 1, and cleared to 0
otherwise. This is the condition for
8
an overflow when negative numbers
are in 2’s complement.
Output F

Pradeep Jha 6CS4-04: Computer Architecture and Organization 52


Conditional Branch Instructions
Mnemonic Branch Condition Tested Mnemonic Branch Condition Tested
Condition Condition
BZ Branch if zero Z=1 BLO Branch if lower A<B
BNZ Branch if not zero Z=0 BLOE Branch if lower or equal A≤B
BC Branch if carry C=1 BE Branch if equal A=B
BNC Branch if no carry C=0 BNE Branch if not equal A≠B
BP Branch if plus S=0 Signed compare conditions (A – B)
BM Branch if minus S=1 BGT Branch if greater than A>B
BV Branch if overflow V=1 BGE Branch if greater or equal A ≥ B
BNV Branch if no overflow V=0 BLT Branch if less than A<B
Unsigned compare conditions (A – B) BLE Branch if less or equal A≤B
BHI Branch if higher A>B BE Branch if equal A=B
BHE Branch if higher or equal A ≥ B BNE Branch if not equal A≠B

Pradeep Jha 6CS4-04: Computer Architecture and Organization 53


Program Interrupt
 The interrupt procedure is, in principle, quite similar to a subroutine call except for three
variations:
1. The interrupt is usually initiated by an internal or external signal rather than from the execution of an
instruction
2. The address of the interrupt service program is determined by the hardware rather than from the address
field of an instruction
3. An interrupt procedure usually stores all the information necessary to define the state of the CPU rather than
storing only the program counter.
 After a program has been interrupted and the service routine been executed, the CPU must
return to exactly the same state that it was when the interrupt occurred. Only if this happens
will the interrupted program be able to resume exactly as if nothing had happened.
 The state of the CPU at the end of the execute cycle (when the interrupt is recognized) is
determined from:
1. The content of the program counter
2. The content of all processor registers
3. The content of certain status conditions

Pradeep Jha 6CS4-04: Computer Architecture and Organization 54


Types of interrupts
 There are three major types of interrupts that cause a break in the normal execution of a
program. They can be classified as:
1. External interrupts
2. Internal interrupts
3. Software interrupts

Pradeep Jha 6CS4-04: Computer Architecture and Organization 55


1. External Interrupt
 External interrupts come from
 Input-output (I/O) devices
 Timing device
 Circuit monitoring the power supply
 Any other external source
 Examples that cause external interrupts are
 I/O device requesting transfer of data
 I/O device finished transfer of data
 Elapsed time of an event
 Power failure
 External interrupts are asynchronous.
 External interrupts depend on external conditions that are independent of the program being
executed at the time.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 56


2. Internal interrupts (Traps)
 Internal interrupts arise from
 Illegal or erroneous use of an instruction or data.
 Examples of interrupts caused by internal error conditions like
 Register overflow
 Attempt to divide by zero
 invalid operation code
 stack overflow
 protection violation.
 These error conditions usually occur as a result of a premature termination of the instruction
execution.
 Internal interrupts are synchronous with the program. If the program is rerun, the internal
interrupts will occur in the same place each time.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 57


3. Software interrupts
 A software interrupt is a special call instruction that behaves like an interrupt rather than a
subroutine call.
 The most common use of software interrupt is associated with a supervisor call instruction.
This instruction provides means for switching from a CPU user mode to the supervisor mode.
 When an input or output transfer is required, the supervisor mode is requested by means of a
supervisor call instruction. This instruction causes a software interrupt that stores the old CPU
state and brings in a new PSW that belongs to the supervisor mode.
 The calling program must pass information to the operating system in order to specify the
particular task requested.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 58


Reduced Instruction Set Computer (RISC)
 Characteristics of RISC are as follows:
 Relatively few instructions
 Relatively few addressing modes
 Memory access limited to load and store instructions
 All operations done within the registers of the CPU
 Fixed-length, easily decoded instruction format
 Single-cycle instruction execution
 Hardwired rather than microprogrammed control
 A relatively large number of registers in the processor unit
 Use of overlapped register windows to speed-up procedure call and return
 Efficient instruction pipeline
 Compiler support for efficient translation of high-level language programs into machine language programs

Pradeep Jha 6CS4-04: Computer Architecture and Organization 60


Overlapped Register Window
R15
Common to D and A
 A characteristic of some
R10 RISC processors is their
R73
Local to D use of overlapped
R64 register windows to
R63
Common to C and D provide the passing of
R58
Proc D R57 parameters and avoid
Local to C the need for saving and
R48
R47
Common to B and C
restoring register values.
Window size = L + 2C + G R42
Proc C R41  Each procedure call
Register file = (L + C) W + G Local to B results in the allocation
R32
R31 of a new window
Common to A and B
R26 consisting of a set of
R9
Common to all Proc B R25
Local to A
registers from the
procedures R16 register file for use by
R0 R15
Global
Common to A and D the new procedure.
R10
registers Proc A

Pradeep Jha 6CS4-04: Computer Architecture and Organization 61


Overlapped Register Window
 Windows for adjacent procedures have overlapping registers that are shared to provide the
passing of parameters and results.
 Suppose that procedure A calls procedure B.
 Registers R26 through R31 are common to both procedures, and therefore procedure A stores
the parameters for procedure B in these registers.
 Procedure B uses local registers R32 through R41 for local variable storage.
 If procedure B calls procedure C, it will pass the parameters through registers R42 through R47.
 When procedure B is ready to return at the end of its computation, the program stores results
of the computation in registers R26 through R31 and transfers back to the register window of
procedure A.
 Note that registers R10 through R15 are common to procedures A and D because the four
windows have a circular organization with A being adjacent to D.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 62


Complex Instruction Set Computer (CISC)
 Characteristics of CISC are as follows:
 A larger number of instructions – typically from 100 to 250 instructions
 Some instructions that perform specialized tasks and are used infrequently
 A large variety of addressing modes – typically from 5 to 20 different modes
 Variable-length instruction formats
 Instructions that manipulate operands in memory

Pradeep Jha 6CS4-04: Computer Architecture and Organization 64


Questions asked in RTU exam
1. Explain different addressing mode with example.
2. Explain register stack and memory stack with neat sketches.
3. What are RISC processors? What are the advantages of RISC architecture over traditional
CISC architecture? State important characteristics of RISC processors.
4. Explain instruction formats with its types.
5. What is PSW? Explain each bit of it.
6. Explain different types of Interrupts.
7. "Write a program to evaluate the following arithmetic statement X = [A * (B + C) - D] / (E + F -
G) i) using a general register computer with three-address instructions ii) using an
accumulator type computer with one-address instructions iii) using a stack organized
computer with zero-address operation instructions."
8. Explain Overlapped register window in RISC.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 66


Computer Architecture and Organization

Unit-3
Pipeline & Vector
Processing

Computer Engineering Department


 Outline
Looping
• Flynn's taxonomy
• Parallel Processing
• Pipelining
• Arithmetic Pipeline
• Instruction Pipeline
• RISC Pipeline
• Vector Processing
• Array Processors
• Questions asked in RTU exam
Flynn's taxonomy

Data Stream

Single Multiple

Single SISD SIMD


Instruction
Stream
Multiple MISD MIMD

Pradeep Jha 6CS4-04: Computer Architecture and Organization 70


Single Instruction Single Data & Single Instruction Multiple Data
Single Instruction Single Data (SISD)
 SISD represents the organization of a single computer containing a control unit, a processor
unit, and a memory unit.
 Instructions are executed sequentially and the system may or may not have internal parallel
processing capabilities.
Single Instruction Multiple Data (SIMD)
 SIMD represents an organization that includes many processing units under the supervision of
a common control unit.
 All processors receive the same instruction from the control unit but operate on different items
of data.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 71


Multiple Instruction Single Data & Multiple Instruction Multiple Data
Multiple Instruction Single Data (MISD)
 There is no computer at present that can be classified as MISD.
 MISD structure is only of theoretical interest since no practical system has been constructed
using this organization.
Multiple Instruction Multiple Data (MIMD)
 MIMD organization refers to a computer system capable of processing several programs at the
same time.
 Most multiprocessor and multicomputer systems can be classified in this category.
 Contains multiple processing units.
 Execution of multiple instructions on multiple data.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 72


Parallel Processing
 Parallel processing is a term used to denote a large class of techniques that are used to
provide simultaneous data-processing tasks for the purpose of increasing the computational
speed of a computer system.
 Purpose of parallel processing is to speed up the computer processing capability and increase
its throughput.
 Throughput:
The amount of processing that can be accomplished during a given interval of time.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 74


Pipelining
 Pipeline is a technique of decomposing a sequential process into sub operations, with each sub
process being executed in a special dedicated segment that operates concurrently with all
other segments.
 A pipeline can be visualized as a collection of processing segments through which binary
information flows.
 Each segment performs partial processing dictated by the way the task is partitioned.
 The result obtained from the computation in each segment is transferred to the next segment
in the pipeline.
 The registers provide isolation between each segment.
 The technique is efficient for those applications that need to repeat the same task many times
with different sets of data.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 76


Pipelining
 Feature of pipelining
1. The partial result obtained from the computation in each segment is transferred to the next
segment in the pipeline are so on
2. The final result is obtained after the data have passed through cell segment
3. Several computational can be perform in different segment the same time

Pradeep Jha 6CS4-04: Computer Architecture and Organization 77


Pipelining example
𝐴𝑖 ∗ 𝐵𝑖 + 𝐶𝑖 for 𝑖 = 1,2,3, … , 7

R1 R2

Multiplier

R3 R4

Adder

R5

Pradeep Jha 6CS4-04: Computer Architecture and Organization 78


Pipelining
 General structure of four segment pipeline

Clock

Input
S1 R1 S2 R2 S3 R3 S4 R4

Pradeep Jha 6CS4-04: Computer Architecture and Organization 79


Space-time Diagram

Segment 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 𝑇1 𝑇2 𝑇3 𝑇4
2 𝑇1 𝑇2 𝑇3 𝑇4 Non Pipelined Architecture
3 𝑇1 𝑇2 𝑇3 𝑇4
4 𝑇1 𝑇2 𝑇3 𝑇4

Segment 1 2 3 4 5 6 7
Clock cycles
1 𝑇1 𝑇2 𝑇3 𝑇4
2 𝑇1 𝑇2 𝑇3 𝑇4
Pipelined Architecture
3 𝑇1 𝑇2 𝑇3 𝑇4
4 𝑇1 𝑇2 𝑇3 𝑇4
Pradeep Jha 6CS4-04: Computer Architecture and Organization 80
Arithmetic Pipeline
 Usually found in high speed computers.
 Used to implement floating point operations, multiplication of fixed point numbers and similar
operations.
 Example:
 Consider an example of floating point addition and subtraction.
𝑋 = 𝐴 × 10𝑎
𝑌 = 𝐵 × 10𝑏
 A and B are two fractions that represent the mantissas and a and b are the exponents.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 82


Example of Arithmetic Pipeline
 Consider the two normalized floating-point numbers:
X = 0.9504 x 103 Y = 0.8200 x 102
 Segment-1: The larger exponent is chosen as the exponent of result.
 Segment-2: Aligning the mantissa numbers
X = 0.9504 x 103 Y = 0.0820 x 103
 Segment-3: Addition of the two mantissas produces the sum
Z = 1.0324 x 103
 Segment-4: Normalize the result
Z = 0.10324 x 104
 The sub-operations that are performed in the four segments are:
1. Compare the exponents
2. Align the mantissas
3. Add or subtract the mantissas
4. Normalize the result
Pradeep Jha 6CS4-04: Computer Architecture and Organization 83
a Exponents b A Mantissas B

R R

Segment 1: Compare exponents by Difference


subtraction

Segment 2: Choose exponent Align mantissas

Add or subtract
Segment 3:
mantissas

R R

Segment 4: Adjust exponent Normalize result

R R
Instruction Pipeline
 In the most general case, the computer needs to process each instruction with the following
sequence of steps
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
 Different segments may take different times to operate on the incoming information.
 Some segments are skipped for certain operations.
 The design of an instruction pipeline will be most efficient if the instruction cycle is divided into
segments of equal duration.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 86


Instruction Pipeline
 Assume that the decoding of the instruction can be combined with the calculation of the
effective address into one segment.
 Assume further that most of the instructions place the result into a processor registers so that
the instruction execution and storing of the result can be combined into one segment.
 This reduces the instruction pipeline into four segments.
1. FI: Fetch an instruction from memory
2. DA: Decode the instruction and calculate the effective address of the operand
3. FO: Fetch the operand
4. EX: Execute the operation

Pradeep Jha 6CS4-04: Computer Architecture and Organization 87


Four segment CPU pipeline
Fetch instruction
Segment1: from memory

Segment2: Decode instruction &


calculate effective address
yes
Branch?
no
Fetch operand from
Segment3:
memory

Segment4: Execute instruction

Interrupt yes no
Interrupt?
handling

Update PC

Empty pipe

Pradeep Jha 6CS4-04: Computer Architecture and Organization 88


Space-time Diagram

Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
3 FI DA FO EX
4 FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX

Pradeep Jha 6CS4-04: Computer Architecture and Organization 89


Space-time Diagram

Step: 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO EX
(Branch) 3 FI DA FO EX
4 FI - - FI DA FO EX
5 - - - FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX

Pradeep Jha 6CS4-04: Computer Architecture and Organization 90


Pipeline Conflict & Data Dependency
Pipeline Conflict
 There are three major difficulties that cause the instruction pipeline conflicts.
1. Resource conflicts caused by access to memory by two segments at the same time. Most of these conflicts
can be resolved by using separate instruction and data memories.
2. Data dependency conflicts arise when an instruction depends on the result of a previous instruction, but this
result is not yet available.
3. Branch difficulties arise from branch and other instructions that change the value of PC.
Data Dependency
 Data dependency occurs when an instruction depend on the result of the previous instruction
but this result is not yet available.
 Pipelined computers deal with such conflicts between data dependencies in a variety of ways
as follows:
1. Hardware Interlocks
2. Operand forwarding
3. Delayed load (Software Method)

Pradeep Jha 6CS4-04: Computer Architecture and Organization 91


Handling Branch Instructions
 The branch instruction breaks the normal sequence of the instruction stream, causing
difficulties in the operation of the instruction pipeline.
 Hardware techniques available to minimize the performance degradation caused by instruction
branching are as follows:
 Pre-fetch target
 Branch target buffer
 Loop buffer
 Branch prediction
 Delayed branch

Pradeep Jha 6CS4-04: Computer Architecture and Organization 92


Performance of a pipelined processor
 Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp
 ‘n’ tasks to be completed in the pipelined processor
 Time taken to execute ‘n’ instructions in a pipelined processor
ETpipeline = k + n – 1 cycles = (k + n – 1) Tp
 A non-pipelined processor, execution time of ‘n’ instructions will be
ET non-pipeline = n * k * Tp
 Speedup (S)
S = Performance of pipelined processor / Performance of Non-pipelined processor
 Efficiency
Efficiency = Given speed up / Max speed up = S / Smax
 Throughput
Throughput = Number of instructions / Total time to complete the instructions
Throughput = n / (k + n – 1) * Tp
Pradeep Jha 6CS4-04: Computer Architecture and Organization 93
Speedup
 Speedup of a pipeline processing over an equivalent non-pipeline processing is defined by the
ratio 𝑛𝑡𝑛
𝑆=
(𝑘 + 𝑛 − 1)𝑡𝑝
 If number of tasks in pipeline increases w.r.t. number of segments then 𝑛 becomes larger than
𝑘 − 1, under this condition speedup becomes
𝑛𝑡𝑛 𝑡𝑛
𝑆= =
𝑛𝑡𝑝 𝑡𝑝

 Assuming time to process a task in pipeline and non-pipeline circuit is same then
𝑘𝑡𝑝
𝑆= =𝑘
𝑡𝑝
 Theoretically maximum speedup achieved is the number of segments in the pipeline.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 94


Numerical
 Consider a non pipelined processor with a clock rate of 2.5 gigahertz and avg. cycle
/instruction of four .the same processor is upgraded to a pipeline processor with five stages
,but due to internal pipeline delay, the clock speed is reduced to 2 gigahertz .assume that there
is no stall in pipeline. The speedup achieved in pipeline processor is

Pradeep Jha 6CS4-04: Computer Architecture and Organization 95


RISC Pipeline (Three segment instruction pipeline)
 Because of the fixed-length instruction format, the decoding of the operation can occur at the
same time as the register selection.
 Since all the operands are in registers, there is no need for calculating an effective address or
fetching of operands from memory.
 RISC has main two facilities:
 Single-cycle instruction execution
 Compiler support
 Mainly three types of instructions in RISC:
1. Data manipulation instructions
2. Data transfer instructions
3. Program control instructions
 The instruction cycle can be divided into three sub-operations and implemented in three
segments:
 I : Instruction fetch
 A : ALU operation
 E : Execute instruction
Pradeep Jha 6CS4-04: Computer Architecture and Organization 97
Delayed Load
Clock cycles: 1 2 3 4 5 6
Load R1 I A E
Pipeline timing with data conflict Load R2 I A E
Add R1+R2 I A E
Store R3 I A E

Clock cycles: 1 2 3 4 5 6 7
Load R1 I A E
Load R2 I A E
No-operation I A E Pipeline timing with delayed load
Add R1+R2 I A E
Store R3 I A E

Pradeep Jha 6CS4-04: Computer Architecture and Organization 98


Delayed Branch

Clock cycle: 1 2 3 4 5 6 7 8 9 10
Load R1 I A E
Increment R2 I A E
Add R3+R4 I A E
Subtract R6-R5 I A E
Branch to X I A E
No operation I A E
No operation I A E
Instruction in X I A E

Using no-operation instructions

Pradeep Jha 6CS4-04: Computer Architecture and Organization 99


Delayed Branch

Clock cycle: 1 2 3 4 5 6 7 8
Load R1 I A E
Increment R2 I A E
Branch to X I A E
Add R3+R4 I A E
Subtract R6-R5 I A E
Instruction in X I A E

Rearranging the instructions

Pradeep Jha 6CS4-04: Computer Architecture and Organization 100


Vector Processing
 In many science and engineering applications, the problems can be formulated in terms of
vectors and matrices that lend themselves to vector processing.
 Applications of Vector processing
1. Long-range weather forecasting
2. Petroleum explorations
3. Seismic data analysis
4. Medical diagnosis
5. Aerodynamics and space flight simulations
6. Artificial intelligence and expert systems
7. Mapping the human genome
8. Image processing
Matrix Multiplication
 Matrix multiplication is one of the most computational intensive operations performed in
computers with vector processors.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 102


Vector Processing
 An n x m matrix of numbers has n rows and m columns and may be considered as constituting
a set of n row vectors or a set of m column vectors.
 Consider, for example, the multiplication of two 3x3 matrices A and B.
𝑎11 𝑎12 𝑎13 𝑏11 𝑏12 𝑏13 𝑐11 𝑐12 𝑐13
𝑎21 𝑎22 𝑎23 x 𝑏21 𝑏22 𝑏23 𝑐21 𝑐22 𝑐23
=
𝑎31 𝑎32 𝑎33 𝑏31 𝑏32 𝑏33 𝑐31 𝑐32 𝑐33
 The product matrix C is a 3 x 3 matrix whose elements are related to the elements of A and B
by the inner product:
3

𝑐𝑖𝑗 = ෍ 𝑎𝑖𝑘 × 𝑏𝑘𝑗


𝑘=1

 The total number of multiplications or additions required to compute the matrix product is 9 x 3
= 27.
Source A

Source B Multiplier Pipeline Adder Pipeline

Pradeep Jha 6CS4-04: Computer Architecture and Organization 103


Memory Interleaving
Address bus

AR AR AR AR

Memory Memory Memory Memory


array array array array

DR DR DR DR

Data bus

Pradeep Jha 6CS4-04: Computer Architecture and Organization 104


Array Processors
 Array processors are also known as multiprocessors or vector processors. They perform
computations on large arrays of data. Thus, they are used to improve the performance of the
computer.
An array processor is a processor that performs computations on large arrays of data.

 Two different types of processors

1. Attached array processor: It is an auxiliary processor attached to a general-purpose computer.

2. SIMD array processor: It is a processor that has a single-instruction multiple-data organization.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 106


1.Attached Array Processors
 Attached array processor: An attached array processor is a processor which is attached to a
general purpose computer and its purpose is to enhance and improve the performance of that
computer in numerical computational tasks. It achieves high performance by means of parallel
processing with multiple functional units.

General-purpose Input-output Attached array


computer interface processor

High-speed memory to memory bus


Main memory Local memory

Pradeep Jha 6CS4-04: Computer Architecture and Organization 107


SIMD Array Processor
SIMD is the organization of a single computer containing multiple processors operating in parallel. The processing units are
made to operate under the control of a common control unit, thus providing a single instruction stream and multiple data
streams.

PE1 M1

Master control
PE2 M2
unit

PE3 M3

Main memory
PEn Mn

Pradeep Jha 6CS4-04: Computer Architecture and Organization 108


Questions asked in RTU exam
1. Discuss four-segment instruction pipeline with diagram(s).
2. Explain Flynn's taxonomy for classifying parallel processors. Explain each class.
3. Draw space-time diagram for 4-segment pipeline with 8 tasks.
4. What is speedup? Derive the equation of speedup for k-segment pipeline processing for
task.
5. Write note on memory interleaving
6. Compare SIMD and MIMD.
7. Explain pipeline processing conflict.
8. Explain vector operation.
9. Explain parallel processing.

Pradeep Jha 6CS4-04: Computer Architecture and Organization 110


Questions asked in RTU exam
10. Explain pipelining. How does it enhance the CPU performance?
11. Explain Vector Processing.
12. Describe SIMD array processor.
13. Explain Delay load.
14. Explain term: Pipeline Conflict.
15. State any two solutions for handling branch difficulties.
16. What do you mean by speed-up in context of pipelining?
17. Data dependency

Pradeep Jha 6CS4-04: Computer Architecture and Organization 111

You might also like