Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views81 pages

Padl 1

The document discusses microprocessor architecture, focusing on instruction handling through fetch and execute cycles, and contrasts RISC and CISC architectures. It also addresses challenges such as power, memory, and instruction-level parallelism (ILP), while introducing concepts like Amdahl's Law and various types of parallel computing. Additionally, it covers the importance of Instruction Set Architecture (ISA) and the relationship between hardware and software in microprocessor design.

Uploaded by

Devaragothaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views81 pages

Padl 1

The document discusses microprocessor architecture, focusing on instruction handling through fetch and execute cycles, and contrasts RISC and CISC architectures. It also addresses challenges such as power, memory, and instruction-level parallelism (ILP), while introducing concepts like Amdahl's Law and various types of parallel computing. Additionally, it covers the importance of Instruction Set Architecture (ISA) and the relationship between hardware and software in microprocessor design.

Uploaded by

Devaragothaman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Processor Architecture and Design

Introduction
How Does a
Microprocessor
handle an
Instruction?
HOW DOES A MICROPROCESSOR HANDLE AN
INSTRUCTION?
Fetch Cycle
 The fetch cycle takes the instruction required from memory, stores it
in the instruction register
Execute Cycle
 The actual actions which occur during the execute cycle of an
instruction

3
Address bus

BIU RD Discs
I/o
WR ROM RAM
Ports Video

Data Bus

ALU

CLK
Control
& Timing

EU
Memory Interface

Program Address Generator

Instruction Register

BIU
Registers
Control & Timing

A
B ALU
C
D
E EU
F
G
H Registers

Block Diagram of a Microprocessor


RISC VS CISC

Which is better ?
WHAT IS THE EFFECT ?
If Instructions can be present anywhere
 Size of Instruction Varies
 Complicates Instruction Decoder
ISA
 CISC
 Operands for Arithmetic/Logic operation
can be in Register/ Memory
 RISC
 Operands for Arithmetic/Logic operation
only in Registers
 Register – Register Architecture
RISC Vs CISC
Goal: Multiply data in mem A with B- put it
back in A
A
CISC:
B Mem
MUL A,B
RISC: C

LDA R0,A
LDA R1,B R0 R1

MUL R0,R1 R2 R3
STR A,R0

x,÷,+,-
Time = Time x Cycles x Instructions
Program cycle Inst Program

RISC CISC
Processor Speed-up
Introduction
PAD© K.R.Anupama

Deeply pipelined machines

Speed up Many instructions/cycle

Out-of-order execution of
instructions

Aggressive branch prediction


techniques
PAD© K.R.Anupama

The Three Walls

THE
THE POWER THE ILP
MEMORY
WALL WALL
WALL
PAD© K.R.Anupama

Power Wall

• Power dissipation depends on


• Clock rate, capacitive load, voltage
• Increases in clock frequency – more power dissipated,
more cooling
• Decreases in voltage – reduce dynamic power
consumption – but increase static power leakage –
transistors

• Reached practical power limit in cooling


The Power Wall

PAD© K.R.Anupama
PAD© K.R.Anupama

The Memory Wall


PAD© K.R.Anupama

ILP
• Pipelined
• VLIW
Course
• Superscalar

DLP
• SIMD
• Vector Architectures
• GPU
PAD© K.R.Anupama

• TLP Course
• MIMD
• Multi-threaded
• Distributed memory MIMD
• Shared memory MIMD
PAD© K.R.Anupama

Arch, Implementation & Realization

• ISA
Architecture • Functional level behavior of processor

• Micro-architecture
Implementation • Logic structure that implements the arch

Realization • Physical Implementation


PAD© K.R.Anupama

Contract between
ISA h/w & s/w
PAD© K.R.Anupama

• Contract between software and hardware


• Multiple machines can implement ISA
• Advantage – program portability
• Microprocessor design – starts with ISA
ISA
• ISA produces – micro architecture
• Micro architecture has to be rigorously
verified
PAD© K.R.Anupama

• Development is very slow


• ISAs varied
ISA • No. of operands
• Implied operands
• Operands may be stored in stack
PAD© K.R.Anupama

Dynamic – Static interface

Separates
Compile Time At run time
what is done
• Statically • Dynamically
PAD© K.R.Anupama

DSI
Program (Software)

Compiler Exposed to
Complexity software Static

Architecture
Hardware Exposed to
Complexity hardware Dynamic

Machine (Hardware)
PAD© K.R.Anupama

DSI

DEL CISC VLIW RISC


HLL

DSI1

DSI2

DSI3

Hardware
PAD© K.R.Anupama

• Traditionally, software has been written for


serial computation:
• To be run on a single computer having a single
What is parallel Central Processing Unit (CPU)
computing ? • A problem is broken into a discrete series of

Serial instructions

Computing • Instructions are executed one after another


• Only one instruction may execute at any
moment in time
PAD© K.R.Anupama

Serial Computing

Problem

CPU

TN T3 T2 T1
PAD© K.R.Anupama

What is parallel computing

• In the simplest sense - parallel computing is the simultaneous use of


multiple compute resources to solve a computational problem:
• To be run using multiple CPUs
• A problem is broken into discrete parts that can be solved concurrently
• Each part is further broken down to a series of insts
• Insts from each part execute simultaneously on different CPUs
PAD© K.R.Anupama

What is parallel computing


Problem
1 CPU 1

Problem
2
CPU 2

Problem
3
CPU 3

Problem
4
CPU 4
PAD© K.R.Anupama

• A Single processor with multiple cores


• A single computer with multiple processors
Parallel
• An arbitrary number of computers connected
Computing by a network
• A combination of all three
Parallel Computing

• The computational problem should be able to:


• Be broken apart into discrete pieces of work that
can be solved simultaneously
• Execute multiple program instructions at any
moment in time
• Be solved in less time with multiple compute
resources than with a single compute resource.

PAD© K.R.Anupama
The most import
law in micro-
architecture
PAD© K.R.Anupama

Amdahl’s Law

Ttotal = 1
Timproved [ Ttotal - Tcomponent ]+ Tcomponent
n
PAD© K.R.Anupama

Law of Diminishing Returns

1-f enh
PAD© K.R.Anupama

Types and Levels of parallelism

Functional parallelism irregular

Data level parallelism regular


PAD© K.R.Anupama

Functional Parallelism

Instruction level

Loop Level

• recurrences

Procedure level

Program level
PAD© K.R.Anupama

Flynn’s Taxonomy

SISD SIMD MISD MIMD


PAD© K.R.Anupama

Basic Parallel Techniques

Pipelining

Replication
ILP
TYPES OF ILP-PROCESSORS
Traditional Von- Scalar ILP Superscalar ILP
Neumann
• Sequential • Sequential Issue • Parallel Issue –
Issue – – Parallel Parallel
Sequential Execution Execution
Execution • VLIW – static
schedule
• Superscalar -
dynamic

PAD© K.R.ANUPAMA
INTERNAL OPERATION

Pipelined Processors VLIW /superscalar

PAD© K.R.ANUPAMA
VLIW & SUPERSCALAR ARCHITECTURE

EU1 EU2 EU3

Register File

PAD© K.R.ANUPAMA
VLIW
Instruction Fetch

EU1 EU2 EU3

Register File

PAD© K.R.ANUPAMA
SUPERSCALAR
Instruction
Dispatch unit
Fetch

EU1 EU2 EU3

Register File

PAD© K.R.ANUPAMA
PIPELINE Scalar

PAD© K.R.ANUPAMA
AMDAHL’S LAW
Ttotal = 1
Timproved [ Ttotal - Tcomponent]+ Tcomponent
n

PAD© K.R.ANUPAMA
PIPELINE – N STAGES

Phase 1 Filling

Phase 2 Full Phase

Phase 3 Draining

PAD© K.R.ANUPAMA
IDEALIZED PIPELINE EXECUTION
N

1-g g

PAD© K.R.ANUPAMA
IDEALIZED PIPELINE EXECUTION
N

1-g g

PAD© K.R.ANUPAMA
IDEALIZED PIPELINE EXECUTION
N

1-g g

PAD© K.R.ANUPAMA
REALISTIC PIPELINE EXECUTION PROFILE
N

PAD© K.R.ANUPAMA
REALISTIC PIPELINE EXECUTION PROFILE
N

PAD© K.R.ANUPAMA
AMDAHL’S LAW
S = 1
[ 1 - g]+ g
N

PAD© K.R.ANUPAMA
100% - S =5
N=5 90% - S = 3.57

AMDAHL’S LAW N = 10
100% - S = 10
90% - S = 5.26

100% - S = 20
N = 20 90% - S = 6.897

PAD© K.R.ANUPAMA
AMDAHL’S LAW
S = 1
g1+ g2 + … gn
1 2 N

PAD© K.R.ANUPAMA
AMDAHL’S LAW - SUPERSCALAR
S = 1
[ 1 – f ]+ f
N

PAD© K.R.ANUPAMA
AMDAHL’S LAW - SUPERSCALAR
S = 1
[ 1 – f ]+ f
N1 N2

PAD© K.R.ANUPAMA
3F
• Flynn • 2

• Foster • 51

• Fishers • 90

PAD© K.R.ANUPAMA
PARAMETERS – JOUPPI
CLASSIFICATION
Operation Latency
Issue Latency
Machine Parallelism
Issue Parallelism
All static parameters
Super Pipeline
 Minor Cycles

PAD© K.R.ANUPAMA
Fetch Decode Execute Writeback

F unit D unit E unit

Register File

Cache Memory
PAD© K.R.ANUPAMA
BASE PIPELINE

IF DE EX WB

OL 1

MP 4

IL 1

IP 1

PAD© K.R.ANUPAMA
SUPER-PIPELINED

IF DE EX WB

OL 1(3)

MP 12

IL 1 per1minor
cycle
IP 3

PAD© K.R.ANUPAMA
PIPELINE
Under- Pipelined Super-Pipelined
Execution > Issue Issue > Executions
Deeply Pipelined machine
Restrictions on forwarding Paths

PAD© K.R.ANUPAMA
MIPS R4000
8 Physical Stages
Each Stages -10ns
ClK – 50MHz
Clock doubler present internally
20ns – base line

PAD© K.R.ANUPAMA
MIPSR4000
PIPELINE IF1 IF2 RF EX DF1 DF2 TC WB

PAD© K.R.ANUPAMA
SUPERSCALAR
IF DE EX WB
IF DE EX WB
IF DE EX WB

OL 1

MP 12

IL 1

IP 3

PAD© K.R.ANUPAMA
SUPERSCALAR- SUPER PIPELINED
IF DE EX WB
IF DE EX WB
IF DE EX WB

OL 3
1

MP 36

IL 1/minor
1
cycle
IP 9

PAD© K.R.ANUPAMA
IF DE EX WB
EX WB
EX WB

VLIW
PAD© K.R.ANUPAMA
DYNAMIC BEHAVIOR Effect of Dependencies

PAD© K.R.ANUPAMA
Data
DEPENDENCIES Control
BETWEEN Resource

INSTRUCTIONS

PAD© K.R.ANUPAMA
Straight line

DATA
DEPENDENCY
Loops

PAD© K.R.ANUPAMA
STRAIGHT LINE CODE
RAW/ true
Load- use
I1: load r1, a;
I2: add r2,r1,r1;
 Define –used
I1: mul r1 ,r4, r5
I2:add r2,r1,r1;

PAD© K.R.ANUPAMA
STRAIGHT LINE CODE

WAR /false/anti
I1: mul r1,r2,r3
I2: add r2,r4,r5

PAD© K.R.ANUPAMA
STRAIGHT LINE CODE
WAW/output
I1: mul r1,r2,r3
I2: add r1,r4,r5

PAD© K.R.ANUPAMA
Inter-iteration/ loop-carried
do I = 2, n
X(I) = A * X(I-1) + B
RECURRENCES End do
First order
kth order

PAD© K.R.ANUPAMA
DATA DEPENDENCY GRAPH
i1 i2
load r1,a
load r2,b δt δt
i3
add r3,r2,r1
mul r1,r2,r4 δa
i4
div r1,r2,r4
δo
i5

PAD© K.R.ANUPAMA
No control statements
DIFFERENCE
BETWEEN DFG & DFG shows only RAW
DDG
Compilers create both
DDG & DFG

PAD© K.R.ANUPAMA
BASIC BLOCK
calc: add r3,r1,r2
sub r4,r1,r2
mul r5,r3,r4
mul r7,r6,r6
sub r8,r7,r5
jn negproc;

PAD© K.R.ANUPAMA
CONTROL DEPENDENCIES
mul r1,r2,r3
jz zproc
sub r4,r7,r1
.
.
zproc: load r1,x

PAD© K.R.ANUPAMA
CONTROL DEPENDENCIES
General purpose program 20-30%
Scientific/technical program 5-10 %
Avg branch distance
 4.6
 3-6th inst
 9.2
 10-20th inst

PAD© K.R.ANUPAMA
VLIW

PAD© K.R.ANUPAMA
RESOURCE Single non-pipelined division unit
div r1,r2,r3
DEPENDENCY div r4,r5,r6

PAD© K.R.ANUPAMA

You might also like