Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views99 pages

Module 3 Bcse

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views99 pages

Module 3 Bcse

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Allocation of Bits

• For a two-operand arithmetic instruction, five items need


to be specified
• Operation to be performed (opcode)
• Location of the first operand
• Location of the second operand
• Place to store the result
• Location of next instruction to be executed
• Assumptions
• 24-bit memory address (3 bytes)
4-address machines and operations
• Explicit addresses for 2 operands, results and next
instruction
• So, 4 * 3 + 1 = 13 bytes are required for 4-address
instruction

. . CPU Add Res, Op1, Op2, Nexti


(Res <- Op1 + Op2)
Op1 Addr: OP1

Op2 Addr: OP2. +

Res Addr: Res

Bits: 8 24 24 24 24
Nexti Addr: Nexti
Opcode Res Addr Op1 Addr Op2 Addr Nexti Addr
4-address instruction contd.,
• A layout of the instruction in the memory might appear as
shown below

• Number of memory accesses:


– When instruction is fetched: 5
– To fetch the operands: 2
– To store the result: 1
– Total: 8
3-address machines and operations
• Explicit addresses for 2 operands, result. The next instruction
address is stored in the program Counter register (except for
branch instructions)

. .
• So, 3 * 3 + 1 = 10 bytes are required for 3-address instruction
CPU

Op1 Addr: OP1

Op2 Addr: OP2. +

Bits: 8 24
Add Res, Op1, Op2
(Res <- Op1 + Op2)

24 24
Opcode Res Addr Op1 Addr Op2 Addr
Res Addr: Res

Program 24
Nexti Addr: Nexti Counter
3-address instruction contd.,
• Number of memory accesses:
• When instruction is fetched: 4
• To fetch the operands: 2
• To store the result: 1
• Total: 7
2-address machines and operations
• Result overwrites Operand 2, so needs only 2 addresses in the instruction
• The next instruction address is stored in the program Counter register
(except for branch instructions)
• So, 2 * 3 + 1 = 7 bytes are required for 2-address instruction

. . CPU Add Op1, Op2


(Op2 <- Op1 + Op2)
Op1 Addr: OP1

Op2 Addr: .
OP2, Res
+
Bits: 8 24 24
Opcode Op1 Addr Op2 Addr

Program 24
Nexti Addr: Nexti Counter
2-address instruction contd.,
• Number of memory accesses:
• When instruction is fetched: 3
• To fetch the operands: 2
• To store the result: 1
• Total: 6
1-address machines and operations
• Special CPU register, the accumulator, supplies 1 operand and stores result
• One memory address used for other operand
• Need instructions to load and store operands:
– LDA OpAddr
– STA OpAddr
• This Instruction requires 1 * 3+1= 4 bytes

Op1 Addr: OP1 . . +


CPU
Add Op1
(Acc <- Acc + Op1)

. Accumulator
Bits: 8 24
Opcode Op1 Addr

Program 24
Nexti Addr: Nexti Counter
1-address instruction contd.,

• Number of memory accesses:


• When instruction is fetched: 2
• To fetch the operands: 1
• To store the result: 0
• Total: 3
The 0-address, or stack machine and
Instruction Format
• Uses a push down stack in CPU
• Uses stack for both operands and the result
• Computer must have a 1-address instruction to push
and pop operands to and from the stack
• Number of memory access required:
• For push and pop:
• to fetch an instruction: 2
• to fetch an operand value: 1
• Total 3
• For operations: 1 (to fetch an instruction)
push Op1 Add (TOS <- TOS + SOS)
(TOS <- Op1)
Bits: 8
Bits: 8 24 Opcode
Opcode Op1 Addr

CPU

Op1 Addr: OP1 ●●


TOS ●
+
SOS
TOS

Stack
Program 24
Nexti Addr: Nexti Counter
Problems
• Evaluate a = (b+c)*d – e in 3-, 2-, 1-, 0- address machines

3- 2- 1- 0-
address address address address
add a,b,c load a,b Load b Push b
mul a,a,d Add a,c Add c Push c
sub a,a,e Mul a,d Mul d Add
Sub a,e Sub e Push d
Store a Mul
Push e
Sub
Pop a
Assignment

• Write a program to evaluate the arithmetic statement:


• Using a general register computer with three address instructions
• Using a general register computer with two address instructions
• Using an accumulator type computer with one address instructions
• Using a stack organized computer with zero-address operation
instructions

A  B  C * (D * E  F )
X
G  H *K
Addressing Modes
• Specify the way the operands are selected during program
execution.
• Usage
1. To give programming flexibility to the user
• pointers to memory, counters for loop control, indexing of data, ….
2. To reduce the number of bits in the addressing field of the
inst.

Dr. S.Meenatchi, SITE, VIT


Different Types of addressing mode
1. Implied Addressing Mode
2. Immediate Addressing Mode
3. Direct Addressing Mode
4. Indirect Addressing Mode
5. Register Direct Addressing Mode
6. Register Indirect Addressing Mode
7. Displacement Addressing Mode (combines the direct
addressing and register addressing modes)
1. Relative Addressing Mode
2. Indexed Addressing Mode
3. Base Addressing Mode
8. Auto Increment and Auto Decrement Addressing Mode

Dr. S.Meenatchi, SITE, VIT


1. Implied Mode
• Operands are specified implicitly in the instruction.
• No address field is required
• 0-address inst. are implied mode inst.
• Used by Stack-organized computer
• Examples:
– COM : Complement Accumulator
• Operand in AC is implied in the inst.
– ADD
• Operands are implied to be on top of the stack.
• Effective Address (EA) = AC or Stack[SP]

Dr. S.Meenatchi, SITE, VIT


2. Immediate Mode
• Operand is specified in the inst. itself.
• Operand field contains the actual operand.
• Useful for initializing registers to a constant value.
• Advantage: No memory Reference, fast
• Disadvantage: Limited operand magnitude
• Example:
Instruction
– LD #NBR
Opcode Operand
– Mov R1, #200

Dr. S.Meenatchi, SITE, VIT


3. Register Mode
• Operands are in registers.
• Register is selected from a register field in the inst.
– k-bit register field can specify any one of 2k registers.
• Advantage: No memory reference, shorter instructions, faster
instruction fetch, very fast execution
• Disadvantage: Limited address space as limited number of
registers
• Example:
– LOAD R1

– MOV R1,R2

Dr. S.Meenatchi, SITE, VIT


4. Register Indirect Mode
• Register contains the address of the operand.
• Advantage: address field of the inst. uses fewer bits to select a
register than bits required to select a memory address
• Disadvantage: Extra memory reference
• Example:
– LOAD (R1)

– MOV R1,(R2)

Dr. S.Meenatchi, SITE, VIT


5. Autoincrement or Autodecrement
Mode
• Similar to the register indirect mode except that
– the register is incremented after its value is used to access memory
– the register is decrement before its value is used to access memory
• Example (Autoincrement):
– LOAD (R1)+

• 2 forms post and pre:


– Mov R1,(R2)+ → post increment
– Mov R1,+(R2) → pre increment
– Mov R1,(R2)- → post decrement
– Mov R1,-(R2) → post decrement

Dr. S.Meenatchi, SITE, VIT


6. Direct Addressing Mode
• Operand resides in memory and its address is given in the inst.
• In a branch-type inst. address field specifies the actual branch
address.
• Advantage: Simple memory reference to access data, no
additional calculations to work out effective address
• Disadvantage: Limited address space
• Example:
– LOAD ADR

– MOV R1,2000

Dr. S.Meenatchi, SITE, VIT


7. Indirect Addressing Mode
• Address field of inst. gives the address where the effective
address is stored in memory
• Advantage: Large address space, may be nested, multilevel or
cascaded
• Disadvantage: Multiple memory accesses to find the operand,
hence slower
• Example:
– LOAD @ADR

– MOV R1,(2000)

Dr. S.Meenatchi, SITE, VIT


8. Displacement Addressing Mode
• EA = A + (R)
• Address field holds two values
– A = Base value
– R = register that holds displacement
– Or vice-versa

Dr. S.Meenatchi, SITE, VIT


(i) Relative Addressing Mode
• PC is added to the address part of the instruction to obtain the
effective address
• EA= [PC] + address part of the inst.
• Advantage: Flexibility
• Disadvantage: Complexity
• Example:
– LOAD $ADR

• Program Counter (PC)


– Keeps track of the insts. in the program stored in memory.
– Holds the address of the inst. to be executed next.
– Incremented each time an inst. is fetched from memory
Dr. S.Meenatchi, SITE, VIT
(ii) Indexed Addressing Mode
• XR (Index register) is added to the address part of the
instruction to obtain the effective address
• EA= [Index reg.] + address part of the inst.
• Advantage: Flexibility, good for accessing arrays
• Disadvantage: Complexity
• Example:
– LOAD ADR(XR)

Dr. S.Meenatchi, SITE, VIT


(iii) Base Register Addressing Mode
• EA=Content of a base register + address part of the inst.
• Similar to the indexed addressing mode except that the register
is now called a base register (BR) instead of an index register.
• Example:
– LOAD ADR(BR)

• Base register hold a base address


• The address field of the instruction gives a displacement
relative to this base address

Dr. S.Meenatchi, SITE, VIT


Data Transfer Instructions with Different
Addressing Modes
@ : Indirect Address
$ : Address relative to PC
# : Immediate Mode
( ) : Index Mode, Register Indirect, Autoincrement

Dr. S.Meenatchi, SITE, VIT


Basic addressing modes differences
Mode Algorithm Advantage disadvantage

Immediate Operand=A No memory Limited operand


reference magnitude
Direct EA=A Simple Limited address
space
Indirect EA=(A) Large address Multiple memory
space references
Register EA=R No memory Limited address
reference space
Register indirect EA=(R) Large address Extra memory
space refernce
Displacement EA=A+(R) Flexibility Complexity

Stack EA=top of stack No memory Limited


reference
Dr. S.Meenatchi, SITE, VIT applicability
Problems
• Find the effective address and the content of AC for the given data.

Dr. S.Meenatchi, SITE, VIT


Addressing Mode Effective Content of AC
Address
Direct Address 500 AC ← (500) 800
Immediate operand 201 AC ← 500 500
Indirect address 800 AC ← ((500)) 300
Relative address 702 AC ← (PC + 500) 325
Indexed address 600 AC ← (XR + 500) 900
Register - AC ← R1 400
Register Indirect 400 AC ← (R1) 700
Autoincrement 400 AC ← (R1)+ 700
Autodecrement 399 AC ← -(R1) 450

Dr. S.Meenatchi, SITE, VIT


Numerical Example

Dr. S.Meenatchi, SITE, VIT


Instruction Types

Computer System Architecture


By
M. Morris Mano

Dr.S.Meenatchi, SITE, VIT


Classification of computer instructions
• Most computer instructions can be classified into three
categories:
1. Data transfer
2. Data manipulation
3. Program control instructions

Dr.S.Meenatchi, SITE, VIT


1. Data Transfer Instruction
• Move data from one place to another without changing the
data content in the computer.
• Different data transfers:
1. Memory processor registers
2. Processor registers input or output
3. Processor register processor register

Dr.S.Meenatchi, SITE, VIT


1. Data Transfer Instruction cont..

Dr.S.Meenatchi, SITE, VIT


1. Data Transfer Instruction cont..
• Load: transfer from memory to a processor register, usually an
AC (memory read)
• Store: transfer from a processor register into memory (memory
write)
• Move: transfer from one register to another register
• Exchange: swap information between two registers or a
register and a memory word
• Input/Output: transfer data among processor registers and
input/output device
• Push: transfer data between processor registers and a memory
stack
• Pop : transfer data from stack to processor registers.

Dr.S.Meenatchi, SITE, VIT


2. Data Manipulation Instruction
• Perform operations on data and provide the computational
capabilities for the computer.
I. Arithmetic,
II. Logical and bit manipulation,
III. Shift Instruction

Dr.S.Meenatchi, SITE, VIT


I. Arithmetic Instructions
• Increment
• Decrement
• Add
• Subtract
• Multiply
• Divide
• Add with carry
• Subtract with borrow
• Negate (2’s complement) – change the sign of the operand
• Absolute – replace operand by its absolute value.

Dr.S.Meenatchi, SITE, VIT


I. Arithmetic Instructions cont..

Dr.S.Meenatchi, SITE, VIT


I. Arithmetic Instructions cont..
Operation Name Description

Add Computes sum of two operands

Subtract Difference bw 2 operands

Multiply Product of 2 operands

Divide Compute quotient of 2 operands

Absolute Replace operand by its absolute value

Negate Change sign of operand

Increment Add 1 to operand

Decrement Subtract 1 from operand


Dr.S.Meenatchi, SITE, VIT
II. Logical and Bit Manipulation
Instructions
• Clear (can also be included in data transfer instruction based
on the way the operation is performed – 0’s transferred to
the destination)
• Complement
• AND – to clear selected bits
• OR – to set selected bits
• Ex-Or – to complement selected bits
• Clear carry
• Set carry
• Complement carry
• Enable interrupt – flip-flop that controls the interrupt facility
is enabled
• Disable interrupt – flip-flop that controls the interrupt
facility is disabled. Dr.S.Meenatchi, SITE, VIT
II. Logical and Bit Manipulation
Instructions cont..

Dr.S.Meenatchi, SITE, VIT


III. Shift Instructions
1. Logical left shift
2. Logical right shift
3. Arithmetic shift left
4. Arithmetic shift right
5. Rotate right
6. Rotate left
7. Rotate right through carry
8. Rotate left through carry

Dr.S.Meenatchi, SITE, VIT


III. Shift Instructions cont..

Dr.S.Meenatchi, SITE, VIT


• Logical shift left
0 0 0 1 1 0 1 0

0 0 0 1 1 0 1 0 Shift by 1 bit towards left

0 0 1 1 0 1 0 0 After shifting two times

After shifting three times


0 1 1 0 1 0 0 0

Dr.S.Meenatchi, SITE,
0 VIT
• Logical shift Right
0 0 0 1 1 0 1 0

0 0 0 0 1 1 0 1 0 Shift by 1 bit towards right

0 0 0 0 0 1 1 0 1 After shifting two times

After shifting three times


0 0 0 0 0 0 1 1 0

Dr.S.Meenatchi, SITE, VIT


• Arithmetic shift left

0 0 0 1 1 0 1 0

Sign bit

0 0 0 1 1 0 1 0 Shift by 1 bit towards left

Sign bit
0

0 0 1 1 0 1 0 0 After shifting two times

Sign bit
0

After shifting three times


0 1 1 0 1 0 0 0

Overflow occurs as
Sign bit Dr.S.Meenatchi, SITE, sign bit changes
0 VIT
• Arithmetic shift Right: -ve values are in 2’s complement form

1 0 0 1 1 0 1 0

Sign bit

1 0 0 1 1 0 1 0 Shift by 1 bit towards right

Sign bit

1 1 0 0 1 1 0 1 After shifting two times

Sign bit

After shifting three times


1 1 1 0 0 1 1 0

Sign bit Dr.S.Meenatchi, SITE, VIT


• Rotate left
0 0 0 1 1 0 1 0

0 0 0 1 1 0 1 0 Rotate by 1 bit towards left


Buffer

0 0 1 1 0 1 0 0 After rotating two times

Buffer

After rotating three times


0 1 1 0 1 0 0 0

Buffer
Dr.S.Meenatchi, SITE, VIT
• Rotate right
0 0 0 1 1 0 1 0

rotate by 1 bit towards right

0 0 0 1 1 0 1 0
Buffer

0 0 0 0 1 1 0 1 After rotating two times

Buffer

After rotating three times


1 0 0 0 0 1 1 0

Buffer
Dr.S.Meenatchi, SITE, VIT
• Rotate left through carry
0 0 0 1 1 0 1 0

Rotate by 1 bit towards left

0 0 0 1 1 0 1 0 0
Buffer Carry

0 0 1 1 0 1 0 0 0 After rotating two times

Buffer Carry

After rotating three times


0 1 1 0 1 0 0 0 0

Buffer Carry
Dr.S.Meenatchi, SITE, VIT
• Rotate right through carry
0 0 0 1 1 0 1 0

rotate by 1 bit towards right

0 0 0 0 1 1 0 1 0
Carry Buffer

0 0 0 0 0 1 1 0 1 After rotating two times


Carry Buffer

After rotating three times


0 1 0 0 0 0 1 1 0
Carry
Buffer
Dr.S.Meenatchi, SITE, VIT
3. Program control Instructions
Operation Name Description
Jump Unconditional transfer
Unconditional
Jump Test specified condition
Jump to subroutine Jump to specified address
Return Replace the content of PC
Execute Execute instructions
Skip Increment PC to skip next Instruction
Skip condti Test conditon for skip
Halt Stop program execution
Wait (hold) Stop program execution and resume when condition
satisfied
No operation No operation performed but program execution
continued
Dr.S.Meenatchi, SITE, VIT
PERFORMANCE
1. PROCESSOR CLOCK
2. 2. BASIC PERFORMANCE EQUATION

Dr.S.Meenatchi, SCORE, VIT, Vellore


PERFORMANCE
• The most important measure of performance of a computer is how quickly it
can execute programs.
• Speed of a computer is affected by design of its hardware and its machine
language instructions.
• Programs are usually written in a high-level language
o Performance is also affected by compiler that translates programs into
machine language.
• For best performance, it is necessary to design compiler, machine instruction
set, and hardware in a coordinated way.
• We concentrate on design of instruction sets and hardware.

Dr.S.Meenatchi, SCORE, VIT, Vellore


PERFORMANCE
• How the operating system overlaps processing,
disk transfers, and printing for several programs
to make use of resources available.
• Elapsed time
• Total time required to execute the program.
• Ex: It is t5 to t0.
• It is a measure of the performance of entire
computer system.
• It is affected by the speed of processor, disk,
and printer.
• It depends on all units in a computer system.
Dr.S.Meenatchi, SCORE, VIT, Vellore
PERFORMANCE
• Processor time
• Time needed to execute the program by the processor.
• It is a measure of performance of processor.
• Sum of these periods during which the processor is active.
• It depends on the hardware involved in the execution of individual
machine instructions.
• Hardware comprises processor and memory, which are connected by a
bus.

Dr.S.Meenatchi, SCORE, VIT, Vellore


PERFORMANCE
Let us examine the flow of program instructions and data between memory
and processor.
• At the start of execution, all program instructions and required data are stored
in main memory.
• As execution proceeds, instructions are fetched one by one over the bus into
processor, and a copy is placed in cache.
• When execution of an instruction calls for
data located in main memory, data are
fetched and a copy is placed in cache.
• Later, if same instruction or data item is
needed a second time, it is read directly
from cache. Dr.S.Meenatchi, SCORE, VIT, Vellore
PERFORMANCE
• Processor and a small cache memory can be fabricated on a single integrated
circuit chip.
• Internal speed of performing instruction processing on such chips is very high
and is considerably faster than the speed at which instructions and data can be
fetched from main memory.
• Program will be executed faster if movement of instructions and data between
main memory and processor is minimized, which is achieved by using cache.
• Example: Program loop: number of instructions are executed repeatedly over a
short period of time
• If these instructions are available in cache, they can be fetched quickly
during the period of repeated use.
• Same applies to data that are used repeatedly.
Dr.S.Meenatchi, SCORE, VIT, Vellore
1. PROCESSOR CLOCK
• Processor circuits are controlled by a timing signal called a clock.
• Clock defines regular time intervals, called clock cycles.
• To execute a machine instruction, processor divides the action to be performed
into a sequence of basic steps, such that each step can be completed in one clock
cycle.
• Length of one clock cycle is an important parameter that affects processor
performance. It is denoted by P.
• Its inverse is the clock rate, R = 1/P, which is measured in cycles per second. It is
denoted by hertz (Hz).
• Processors used in personal computers and workstations have clock rates that
range from a few hundred million to over a billion cycles per second.
Dr.S.Meenatchi, SCORE, VIT, Vellore
1. PROCESSOR CLOCK
• Term “cycles per second” is called hertz (Hz).

• Term “million” is denoted by prefix Mega (M).

• Term “billion” is denoted by prefix Giga (G).

• Hence, 500 million cycles per second is usually abbreviated to 500 Megahertz
(MHz).

• 1250 million cycles per second is abbreviated to 1.25 Gigahertz (GHz).

Dr.S.Meenatchi, SCORE, VIT, Vellore


2. BASIC PERFORMANCE EQUATION
• Let T be processor time required to execute a program.
• Assume that complete execution of the program requires the execution of N
machine language instructions.
• N is the actual number of instruction executions.
• N is not equal to number of machine instructions in the object program.
• Some instructions may be executed more than once. Ex: program loop.
• Others may not be executed at all.
• Suppose that average number of steps needed to execute one machine instruction
is S, where each basic step is completed in one clock cycle.
• If clock rate is R cycles per second, program execution time is given by

• This is often referred to as the basic performance equation.


Dr.S.Meenatchi, SCORE, VIT, Vellore
2. BASIC PERFORMANCE EQUATION
• To achieve high performance, the computer designer must seek ways to reduce the
value of T, which means reducing N and S, and increasing R.
• Value of N is reduced if the source program is compiled into fewer machine
instructions.
• Value of S is reduced if instructions have a smaller number of basic steps to
perform or if the execution of instructions is overlapped.
• R can be increased using a higher-frequency clock, which means that time
required to complete a basic execution step is reduced.

Dr.S.Meenatchi, SCORE, VIT, Vellore


Dr.S.Meenatchi, SCORE, VIT, Vellore
Problem-1:
A CPU is driven by 2 GHz clock.
a) Compute the duration of one clock cycle.
b) Assume that on average the execution of an instruction takes 4 clock cycles. Compute
the performance of the CPU in terms of MIPS (millions of instructions per second).
c) Assume that executing a specific program of 400 million instructions takes 2 seconds.
How many clock cycles does it take on average to execute an instruction of this
program?
a) Compute the duration of one clock cycle.
A CPU is driven by 2 GHz clock.
CPU processor speed (no. of clock cycles per second) R = 2 GHz
= 2 * 109 Hz
Duration of one clock cycle P = 1/R = 1/ 2 * 109
= 0.5 * 10-9 Seconds
Dr.S.Meenatchi, SCORE, VIT, Vellore
Problem-1:
b) Assume that on average the execution of an instruction takes 4 clock cycles.
Compute the performance of the CPU in terms of MIPS (millions of instructions per
second).
• Average execution of an instruction takes 4 clock cycles.
• Number of clock cycles per instruction (CPI) = 4
• MIPS(Million of Instructions Per Second) = Number of Clock Cycles per Second (CPU) /
Number of Clock Cycles per Instruction (CPI)
• MIPS=CPU/CPI
= 2 * 109/4
= 0.5 * 103 * 106
= 500 * 106
=500 MIPS

Dr.S.Meenatchi, SCORE, VIT, Vellore


Problem-1:
Assume that executing a specific program of 400 million instructions takes 2
seconds. How many clock cycles does it take on average to execute an instruction of
this program?
Program of 400 million instructions = 400 *106 instructions
CPI (Cycles/Instruction) = Total Number of Clock Cycles (C) / Total Number of
Instructions (I)
Clock Rate = Total Number of Clock Cycles / Second
Total Number of Clock Cycles = Clock Rate * Second
Total Number of Clock Cycles = 2 GHz * 2 second
= 2 * 109 cycles per second * 2 second
= 4 * 109 cycles
CPI (Cycles/Instruction) = 4 * 109 cycles / 400 *106 instructions
= 10 cycles / instructions
Therefore, 10 cycles are needed for one instruction.
Dr.S.Meenatchi, SCORE, VIT, Vellore
Problem-2:
Consider a 3.2 GHz CPU where executing data processing (arithmetic and logical)
instructions takes 4 clock cycles and executing data transfer (load and store) instructions
takes 10 clock cycles. When a specific program of one million instructions runs, 60% of the
instructions are data processing and 40% of the instructions are data transfer. How long
does it take to run this program to completion?
CPU processor speed (no. of clock cycles per second) R = 3.2 GHz
= 3.2 * 109 Hz
= 3.2 * 109 cycles per second
Number of Clock Cycles per Instruction (CPI) for data processing instructions = 4
Number of Clock Cycles per Instruction (CPI) for data transfer instructions = 10
Program of one million instructions runs = 1 * 106 instructions
CPU_time = Clock cycles for a program / Clock rate

Dr.S.Meenatchi, SCORE, VIT, Vellore


Problem-2:
Total Number of Clock Cycles = CPI (Cycles/Instruction) * Total Number of
Instructions (I)
= (60/100) * 4 * 1 *106 + (40/100) * 10 * 1 * 106
= 0.6 * 4 * 106 + 0.4 * 10 * 106
= 2.4 * 106 + 4 * 106
= 6.4 * 106 cycles
CPU_time = Clock cycles for a program / Clock rate
= 6.4 * 106 cycles / 3.2 * 109 cycles per second
= 2 * 10-3 seconds
= 2 milliseconds

Dr.S.Meenatchi, SCORE, VIT, Vellore


RISC vs CISC

Dr.S.Meenatchi, SCORE, VIT, Vellore


Phases of instruction cycle?
In order for a single instruction to be executed by the [CPU], it must go through the
instruction cycle (also sometimes referred to as the fetch-execute cycle). While this cycle
can vary from CPU to CPU, they typically consist of the following stages:

1. Fetch
2. Decode
3. Execute
4. Memory Access
5. Registry Write-Back

In this article, we’ll go through the different stages of the instruction cycle to gain a better
understanding of how the CPU handles instructions.

Stages of the instruction cycle


Fetching the instruction

From the moment we turn our computers on, the CPU is ready to process instructions. As
instructions come in, a register in the CPU referred to as the Program Counter (PC) stores the
memory address of the instruction that should be processed next. When it’s time to start
processing the instruction, the CPU copies the instruction’s memory address and stores the
copied data to another register on the CPU called the Instruction Register (IR). Once the
memory of the instruction is available, the instruction gets decoded.

Think of being at a deli. As you come in and give your order, a ticket containing your data
(name, number in line, and food order) is created and placed somewhere that the deli staff can
easily access and refer to. Once your number comes up, then someone will start working on
your order!

Decoding the instruction

The next stage in the cycle involves decoding the instruction. During this stage, the Control
Unit deciphers what the instruction stored in the IR means. For example, the instruction could
have been sent to do an arithmetic operation or to send information to another piece of
hardware. As the instruction is decoded, they are turned into a series of control signals that
are used to execute the instruction.

Back at the deli, a staff member picks up your order ticket. Before they start making your
order, they first need to figure out what you’re asking them to make!

Executing the instruction

In this stage, the instruction is performed! We noted that during the decoding stage, the
instruction is decoded into control signals and sent to the correct part of the ALU to be
processed and completed.
In our deli example, this is the part where the order gets made!

So to recap, in order to process an instruction, we need to fetch it from memory, decode the
instruction, and execute it. That’s all, right? Not quite! Sometimes a few extra stages need to
occur before or after execution.

Memory access in the instruction cycle

The memory access stage is used to retrieve any required data necessary to execute an
instruction. This stage only occurs if the instruction requires data from memory. For example,
imagine the following [Python] code:

x = 5
y = x + 3

Copy to clipboard

Once the first instruction is complete, a piece of memory is created to store the data x = 5.
The second instruction, y = x + 3, is a little trickier to execute because the value of y relies
on whatever value was assigned to x. Before y = x + 3 can be executed, we need to access
the memory address of the first instruction x = 5 in order to retrieve the data that tells us
what the value of x is.

Imagine in your deli order, you ask for honey mustard to be added to one of the two
sandwiches you order. Before your order can be created, the staff member needs to make and
retrieve honey mustard for the sandwich.

Registry write back

The registry write-back stage is used if the execution of the instruction impacts data. This is
another stage that isn’t always a part of the cycle.

Let’s think back to our previous example:

x = 5
y = x + 3

Copy to clipboard

As each instruction is executed, we find ourselves needing to save this data. During the
registry write-back stage, this new data is stored to one of the register’s in the CPU. The
registry write-back stage is also necessary if existing data is changed or updated.

As the deli’s 10,000th customer, they decide to name your order after you and put it on their
“Deli Specials” board. They need to create space and allocate a part of the board’s “memory”
to store your order.

Conclusion

Nice job reaching the end of the article. Let’s recap what we learned:
• Instructions must go through the instruction cycle in order to be processed by the CPU
• Although the cycle varies amongst CPUs, the stages of the instruction cycle are:
1. Fetch the instruction.
2. Decode the instruction.
3. Execute the instruction.
4. Memory access (if needed).
5. Registry write-back (if needed).
Introduction of ALU and Data Path

Representing and storing numbers were the basic operations of the computers of earlier times. The
real go came when computation, manipulating numbers like adding and multiplying came into the
picture. These operations are handled by the computer's arithmetic logic unit (ALU). The ALU is the
mathematical brain of a computer. The first ALU (Arithmetic Logic Unit) was indeed the INTEL
74181, which was implemented as part of the 7400 series TTL (Transistor-Transistor Logic)
integrated circuits. It was released by Intel in 1970.
What is ALU?
ALU is a digital circuit that provides arithmetic and logic operations. It is the fundamental building
block of the central processing unit of a computer. A modern central processing unit(CPU) has a very
powerful ALU and it is complex in design. In addition to ALU modern CPU contains a control
unit and a set of registers. Most of the operations are performed by one or more ALUs, which
load data from the input register. Registers are a small amount of storage available to the CPU.
These registers can be accessed very fast. The control unit tells ALU what operation to perform on
the available data. After calculation/manipulation, the ALU stores the output in an output
register.

The CPU can be divided into two sections: the data section and the control section. The data section
is also known as the data path.
An Arithmetic Logic Unit (ALU) is a key component of the CPU responsible for performing
arithmetic and logical operations. The collection of functional units like ALUs, registers, and
buses that move data within the processor. together are known as Data Path, they execute
instructions and manipulate data during processing tasks.
BUS
In early computers BUS were parallel electrical wires with multiple hardware connections. Therefore
a bus is a communication system that transfers data between components inside a computer, or
between computers. It includes hardware components like wires, optical fibers, etc and software,
including communication protocols. The Registers, ALU, and the interconnecting BUS are
collectively referred to as data paths.
Types of the bus
There are mainly three type of bus:-
1. Address bus: Transfers memory addresses from the processor to components like storage and
input/output devices. It's one-way communication.
2. Data bus: carries the data between the processor and other components. The data bus is
bidirectional.
3. Control bus: carries control signals from the processor to other components. The control bus
also carries the clock's pulses. The control bus is unidirectional.
The bus can be dedicated, i.e., it can be used for a single purpose or it can be multiplexed, i.e., it can
be used for multiple purposes. when we would have different kinds of buses, different types of bus
organizations will take place.
Registers
In Computer Architecture, the Registers are very fast computer memory which is used to execute
programs and operations efficiently. but In that scenario, registers serve as gates, sending
signals to various components to carry out little tasks. Register signals are directed by the control
unit, which also operates the registers.
The following list of five registers for in-out signal data storage:
1. Program Counter
A program counter (PC) is a CPU register in the computer processor which has the
address of the next instruction to be executed from memory . As each instruction gets
fetched, the program counter increases its stored value by 1. It is a digital counter needed for
faster execution of tasks as well as for tracking the current execution point.
2. Instruction Register
In computing, an instruction register (IR) is the part of a CPU's control unit that holds the
instruction currently being executed or decoded. The instruction register specifically holds the
instruction and provides it to the instruction decoder circuit.
3. Memory Address Register
The Memory Address Register (MAR) is the CPU register that either stores the memory
address from which data will be fetched from the CPU, or the address to which data will be
sent and stored. It is a temporary storage component in the CPU(central processing unit) that
temporarily stores the address (location) of the data sent by the memory unit until the
instruction for the particular data is executed.
4. Memory Data Register
The memory data register (MDR) is the register in a computer's processor, or central
processing unit, CPU, that stores the data being transferred to and from the immediate access
storage. Memory data register (MDR) is also known as memory buffer register (MBR).
5. General Purpose Register
General-purpose registers are used to store temporary data within the microprocessor . It is a
multipurpose register. They can be used either by a programmer or by a user.
What is Data Path?
Suppose that the CPU needs to carry out any data processing action, such as copying data from
memory to a register and vice versa, moving register content from one register to another, or adding
two numbers in the ALU. Therefore, whenever a data processing action takes place in the CPU, the
data involved for that operation follows a particular path, or data path.
Data paths are made up of various functional components, such as multipliers or arithmetic logic
units. Data path is required to do data processing operations.
One Bus Organization
In one bus organization, a single bus is used for multiple purposes. A set of general-purpose registers,
program counters, instruction registers, memory address registers (MAR), memory data registers
(MDR) are connected with the single bus. Memory read/write can be done with MAR and MDR. The
program counterpoints to the memory location from where the next instruction is to be fetched.
Instruction register is that very register will hold the copy of the current instruction. In the case of one
bus organization, at a time only one operand can be read from the bus.
As a result, if the requirement is to read two operands for the operation then the read operation needs
to be carried twice. So that's why it is making the process a little longer. One of the advantages of one
bus organization is that it is one of the simplest and also this is very cheap to implement. At the same
time a disadvantage lies that it has only one bus and this "one bus" is accessed by all general-purpose
registers, program counter, instruction register, MAR, MDR making each and every operation
sequential. No one recommends this architecture nowadays.
Two Bus Organization
To overcome the disadvantage of one bus organization another architecture was developed known as
two bus organization. In two bus organizations, there are two buses. The general-purpose register can
read/write from both the buses. In this case, two operands can be fetched at the same time because of
the two buses. One bus fetch operand for ALU and another bus fetch for register. The situation arises
when both buses are busy fetching operands, the output can be stored in a temporary register and
when the buses are free, the particular output can be dumped on the buses.

There are two versions of two bus organizations, i.e., in-bus and out-bus. From in-bus, the general-
purpose register can read data and to the out bus, the general-purpose registers can write data. Here
buses get dedicated.
Three Bus Organization
In three bus organizations we have three buses, OUT bus1, OUT bus2, and an IN bus. From the out
buses, we can get the operand which can come from the general-purpose register and evaluated in
ALU and the output is dropped on In Bus so it can be sent to respective registers. This implementation
is a bit complex but faster in nature because in parallel two operands can flow into ALU and out of
ALU. It was developed to overcome the busy waiting problem of two bus organizations. In this
structure after execution, the output can be dropped on the bus without waiting because of the
presence of an extra bus. The structure is given below in the figure.

The main advantages of multiple bus organizations over the single bus are as given below.
1. Increase in size of the registers.
2. Reduction in the number of cycles for execution.
3. Increases the speed of execution or we can say faster execution.
DMA
OVERVIEW

⚫ Introduction
⚫ Implementing DMA in a computer system
⚫ Data transfer using DMA controller
⚫ Internal configuration of a DMA controller
⚫ Process of DMA transfer
⚫ DMA transfer modes
Direct Memory Access

⚫ Introduction
⚫ An important aspect governing the Computer System
performance is the transfer of data between memory and I/O
devices.
⚫ The operation involves loading programs or data files from disk
into memory, saving file on disk, and accessing virtual memory
pages on any secondary storage medium.
Computer System with DMA
DIRECT MEMORY ACCESS

Block diagram of DMA controller

Address bus

Data bus Data bus Address bus


buffers buffers

DS

Internal Bus
DMA select Address register
Register select RS
Read RD Word count register
Write WR Control
logic
Bus request BR Control register

Bus grant BG
Interrupt Interrupt DMA request
DMA acknowledge to I/O device
⚫ Consider a typical system consisting of a CPU, memory
and one or more input/output devices as shown in fig.
Assume one of the I/O devices is a disk drive and that
the computer must load a program from this drive into
memory.
⚫ The CPU would read the first byte of the program and
then write that byte to memory. Then it would do the
same for the second byte, until it had loaded the entire
program into memory.
⚫ This process proves to be inefficient. Loading data into,
and then writing data out of the CPU significantly slows
down the transfer. The CPU does not modify the data at
all, so it only serves as an additional stop for data on the
way to it’s final destinaion.
⚫ The process would be much quicker if we could bypass
the CPU & transfer data directly from the I/O device to
memory.
⚫ Direct Memory Access does exactly that.
Implementing DMA in a Computer
System
⚫ A DMA controller implements direct memory access in
a computer system.
⚫ It connects directly to the I/O device at one end and to
the system buses at the other end. It also interacts with
the CPU, both via the system buses and two new direct
connections.
⚫ It is sometimes referred to as a channel. In an alternate
configuration, the DMA controller may be incorporated
directly into the I/O device.
Data Transfer using DMA Controller
⚫ To transfer data from an I/O device to memory, the
DMA controller first sends a Bus Request to the CPU by
setting BR to 1. When it is ready to grant this request,
the CPU sets it’s Bus grant signal, BG to 1.
⚫ The CPU also tri-states it’s address,data, and control
lines thus truly granting control of the system buses to
the DMA controller.
⚫ The CPU will continue to tri-state it’s outputs as long as
BR is asserted.
Internal Configuration

⚫ The DMA controller includes several registers :-


• The DMA Address Register contains the memory address to be
used in the data transfer. The CPU treats this signal as one or more
output ports.
• The DMA Count Register, also called Word Count Register,
contains the no. of bytes of data to be transferred. Like the DMA
address register, it too is treated as an O/P port (with a diff.
Address) by the CPU.
• The DMA Control Register accepts commands from the CPU. It
is also treated as an O/P port by the CPU.
⚫ Although not shown in this fig., most DMA controllers also
have a Status Register. This register supplies information to
the CPU, which accesses it as an I/P port.
DMA TRANSFER
Interrupt
Random-access
BG
CPU memory unit (RAM)
BR
RD WR Addr Data RD WR Addr Data
Read control
Write control
Data bus
Address bus

Address
select

RD WR Addr Data
DS DMA ack.

RS DMA I/O
Controller Peripheral
BR
device
BG DMA request
Interrupt
Internal Configuration of DMA
Controller
Process of DMA Transfer
⚫ To initiate a DMA transfer, the CPU loads the address of
the first memory location of the memory block (to be
read or written from) into the DMA address register. It
does this via an I/O output instruction.
⚫ It then writes the no. of bytes to be transferred into the
DMA count register in the same manner.
⚫ Finally, it writes one or more commands to the DMA
control register.
⚫ These commands may specify transfer options such as
the DMA transfer mode, but should always specify the
direction of the transfer, either from I/O to memory or
from memory to I/O.
⚫ The last command causes the DMA controller to initiate
the transfer. The controller then sets BR to 1 and, once
BG becomes 1 , seizes control of the system buses.
DMA Transfer Modes
Modes vary by how the DMA controller determines when to
transfer data, but the actual data transfer process is the same for
all the modes.

⚫ BURST mode/Block transfer mode/ Idle mode


⚫ An entire block of data is transferred in one contiguous sequence. Once
the DMA controller is granted access to the system buses by the CPU, it
transfers all bytes of data in the data block before releasing control of the
system buses back to the CPU.
⚫ This mode is useful for loading programs or data files into memory, but it
does render the CPU inactive for relatively long periods of time.
⚫ Cycle stealing Mode/Single byte transfer mode
⚫ Viable alternative for systems in which the CPU should not be
disabled for the length of time needed for Burst transfer modes.
⚫ DMA controller obtains access to the system buses as in burst mode,
using BR & BG signals. However, it transfers one byte of data and
then returning control of the system buses to the CPU by BR=0.
⚫ DMAC and CPU are constantly stealing the bus cycles from each
other.
⚫ It continually issues requests via BR, transferring one byte of data
per request, until it has transferred it’s entire block of data.
⚫ By continually obtaining and releasing control of the system buses, the
DMA controller essentially interleaves instruction & data transfers. The
CPU processes an instruction, then the DMA controller transfers a data
value, and so on.
⚫ The data block is not transferred as quickly as in burst mode, but the CPU
is not idled for as long as in that mode.
⚫ Useful for controllers monitoring data in real time.
⚫ Transparent Mode:

⚫ This requires the most time to transfer a block of data, yet it is


also the most efficient in terms of overall system performance.
⚫ The DMA controller only transfers data when the CPU is
performing operations that do not use the system buses.
⚫ For example, the Relatively simple CPU has several states that
move or process data solely within the CPU:
NOP (No operation)
MOV A, B data transfer between CPU registers
ADD B,C arithmetic operation, both the operands are
CPU registers
⚫ Primary advantage is that CPU never stops executing its programs and
DMA transfer is free in terms of time.
⚫ Disadvantage is that the hardware needed to determine when the CPU
is not using the system buses can be quite complex and relatively
expensive.
⚫ Demand transfer mode:
⚫ Data transfer occurred only on the demand of peripheral
device.
⚫ It is similar to the Block transfer, except that the DMAr must
remain active throughout the DMA operation.
⚫ If during the operation DMAR goes low, the DMA operation
will be stopped and bus control will be back with CPU.
⚫ Once DMAR goes high again, the DMA operation continues
from where it had stopped.
⚫ When dreq=1, data transfer will be performed
⚫ When dreq=0, CPU will be the bus master.
Summary
⚫ Advantages of DMA
⚫ Computer system performance is improved by direct transfer of
data between memory and I/O devices, bypassing the CPU.
⚫ CPU is free to perform operations that do not use system buses.

⚫ Disadvantages of DMA
⚫ In case of Burst Mode data transfer, the CPU is rendered
inactive for relatively long periods of time.

You might also like