Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
47 views81 pages

Computer Architecture and Organization

Here are the key requirements for the MIPS datapath based on the instructions: - A 32-entry, 32-bit register file to store the 32 registers - Ability to read two registers from the register file for source operands - An ALU to perform arithmetic and logical operations on the source operands - Ability to write the result of the ALU operation back to a register This will allow the datapath to execute the ADD, SUB, and ORI instructions by reading registers, performing the operation, and writing the result back to a register.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views81 pages

Computer Architecture and Organization

Here are the key requirements for the MIPS datapath based on the instructions: - A 32-entry, 32-bit register file to store the 32 registers - Ability to read two registers from the register file for source operands - An ALU to perform arithmetic and logical operations on the source operands - Ability to write the result of the ALU operation back to a register This will allow the datapath to execute the ADD, SUB, and ORI instructions by reading registers, performing the operation, and writing the result back to a register.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 81

55:035

Computer Architecture and Organization

Lecture 9
Outline
 Building a CPU
 Basic Components
 MIPS Instructions
 Basic 5 Steps for CPU
 Single-Cycle Design
 Multi-cycle Design
 Comparison of Single and Multi-cycle Designs

55:035 Computer Architecture and Organization 2


Overview
 Brief look
 Digital logic

 CPU Datapath
 MIPS Example

55:035 Computer Architecture and Organization 3


Digital Logic
D-type Flip-flop Multiplexer
A
D Q 0
Clock
F
(edge- 1
triggered) B

S (Select input)

D-type Flip-flop with Enable

0 D Q
D Q Q EN
D 1 Clock
(edge-
Clock triggered)
EN (edge-
triggered)
(enable)

55:035 Computer Architecture and Organization 4


Digital Logic

1 Bit 4 Bits N Bits

D3 Q3
D Q D2 Q2 D Q
EN D1 Q1 EN
Clock Clock
(edge- D0 Q0 (edge-
triggered) triggered)
EN
Clock
(edge-
triggered)

Registers
55:035 Computer Architecture and Organization 5
Digital Logic
Tri-state Driver (Buffer)
In Drive Out
in out
0 0 Z
1 0 Z
0 1 0
drive 1 1 1

What is Z ??

55:035 Computer Architecture and Organization 6


Digital Logic
Adder/Subtractor or ALU
A B

Add/sub or ALUop
Carry-out Carry-in

55:035 Computer Architecture and Organization 7


Overview
 Brief look
 Digital logic

 How to Design a CPU Datapath


 MIPS Example

55:035 Computer Architecture and Organization 8


Designing a CPU: 5 Steps
 Analyze the instruction set  datapath requirements
 MIPS: ADD, SUB, ORI, LW, SW, BR
 Meaning of each instruction given by RTL (register transfers)
 2 types of registers: CPU/ISA registers, temporary registers

 Datapath requirements  select the datapath components


 ALU, register file, adder, data memory, etc

 Assemble the datapath


 Datapath must support planned register transfers
 Ensure all instructions are supported

 Analyze datapath control required for each instruction


 Assemble the control logic

55:035 Computer Architecture and Organization 9


Step 1a: Analyze ISA
 All MIPS instructions are 32 bits long.
 Three instruction formats:
31 26 21 16 11 6 0
 R-type op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
 I-type 31 26 21 16 0
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
 J-type 31 26 0
op target address
6 bits 26 bits
 R: registers, I: immediate, J: jumps
 These formats intentionally chosen to simplify design

55:035 Computer Architecture and Organization 10


Step 1b: Analyze ISA
31 26 21 16 11 6 0
R- op rs rt rd shamt funct
type 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
31 26 21 16 0
I-type op rs rt immediate
6 bits 5 bits 5 bits 16 bits
31 26 0
J-type op target address
6 bits 26 bits
 Meaning of the fields:
 op: operation of the instruction
 rs, rt, rd: the source and destination register specifiers
 Destination is either rd (R-type), or rt (I-type)
 shamt: shift amount
 funct: selects the variant of the operation in the “op” field
 immediate: address offset or immediate value
 target address: target address of the jump instruction
55:035 Computer Architecture and Organization 11
MIPS ISA: subset for today
 ADD and SUB 31 26 21 16 11 6 0
 addU rd, rs, rt op rs rt rd shamt funct
 subU rd, rs, rt 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

 OR Immediate: 31 26 21 16 0
 ori rt, rs, imm16 op rs rt immediate
6 bits 5 bits 5 bits 16 bits
 LOAD and STORE Word
 lw rt, rs, imm16
 sw rt, rs, imm16 31 26 21 16 0
op rs rt immediate
 BRANCH: 6 bits 5 bits 5 bits 16 bits
 beq rs, rt, imm16
31 26 21 16 0
op rs rt immediate
6 bits 5 bits 5 bits 16 bits

55:035 Computer Architecture and Organization 12


Step 2: Datapath Requirements
REGISTER FILE RdReg1
Register RdData1
 MIPS ISA requires 32 registers, 32b Numbers RdReg2
each (5 bits ea) REGFILE
 Called a register file WrReg
RdData2
 Contains 32 entries WrData
 Each entry is 32b
How to
 AddU rd,rs,rt or SubU rd,rs,rt
 Read two sources rs, rt implement? RegWrite
 Operation rs + rt or rs – rt
 Write destination rd ← rs+/-rt
Zero?
 Requirements
 Read two registers (rs, rt) Result
 Perform ALU operation
 Write a third register (rd) ALU

ALUop
55:035 Computer Architecture and Organization 13
Step 3: Datapath Assembly
 ADDU rd, rs, rt SUBU rd, rs, rt
 Need an ALU
 Hook it up to REGISTER FILE
 REGFILE has 2 read ports (rs,rt), 1 write port (rd)

Parameters rs RdReg1 Zero?


Come From RdData1
rt RdReg2
Instruction
REGFILE Result
Fields rd WrReg
RdData2
WrData ALU
Control Signals Depend
Upon Instruction Fields ALUop
RegWrite
Eg:
ALUop = f(Instruction)
= f(op, funct)
55:035 Computer Architecture and Organization 14
Steps 2 and 3: ORI Instruction
 ORI rt, rs, Imm16
 Need new ALUop for ‘OR’ function, hook up to REGFILE
 1 read port (rs), 1 write port (rt), 1 const value (Imm16)

rs RdReg1
RdData1
rt RdReg2 Zero?
From
Instruction REGFILE
X
rt rd WrReg Result
RdData2
WrData 0
ALU
ZERO- 1
Control Signals Imm16
Depend Upon RegWrite 16-bits EXTEND ALUop
Instruction Fields ALUsrc

E.g.:
ALUsrc = f(Instruction)
= f(op, funct)
55:035 Computer Architecture and Organization 15
Steps 2 and 3 Destination Register
 Must select proper destination, rd or rt
 Depends on Instruction Type
 R-type may write rd
 I-type may write rt

rs RdReg1
RdData1
rt RdReg2 Zero?
From 1 REGFILE
Instruction WrReg RdData2 Result
rd 0 WrData 0
ALU
ZERO- 1
Imm16
RegWrite 16-bits EXTEND ALUop
RegDst
ALUsrc

55:035 Computer Architecture and Organization 16


Steps 2 and 3: Load Word
 LW rt, rs, Imm16
 Need Data Memory: data ← Mem[Addr]
 Addr is rs+Imm16, Imm16 is signed, use ALU for +
 Store in rt: rt ← Mem[rs+Imm16]

rs RdReg1
RdData1
rt RdReg2 Zero?
DATAMEM
1 REGFILE
WrReg RdData2 Result Addr
rd 0 WrData 0 RdData 0
ALU
Imm16 SIGN/ 1
ZERO- 1
RegDst RegWrite EXTEND
ALUsrc ALUop MemtoReg

ExtOp
17
55:035 Computer Architecture and Organization
Steps 2 and 3: Store Word
 SW rt, rs, Imm16
 Need Data Memory: Mem[Addr] ← data
 Addr is rs+Imm16, Imm16 is signed, use ALU for +
 Store in Mem: Mem[rs+Imm16] ← rt

rs RdReg1
RdData1
rt RdReg2 Zero?
DATAMEM
1 REGFILE
WrReg Result Addr
RdData2
rd 0 WrData 0 RdData 1
ALU
WrData
Imm16 SIGN/ 1
ZERO- 0
RegWrite EXTEND
RegDst
ALUsrc ALUop MemWrite

ExtOp MemtoReg
55:035 Computer Architecture and Organization 18
Writes: Need to Control Timing
 Problem: write to data memory
 Data can come anytime
 Addr must come first
 MemWrite must come after Addr
 Else? writes to wrong Addr!

 Solution: use ideal data memory


 Assume everything works ok
 How to fix this for real?
 One solution: synchronous memory
 Another solution: delay MemWr to come late

 Problems?: write to register file


 Does RegWrite signal come after WrReg number?
 When does the write to a register happen?
 Read from same register as being written?

55:035 Computer Architecture and Organization 19


Missing Pieces: Instruction Fetching
 Where does the Instruction come from?
 From instruction memory, of course!

 Recall: stored-program concept


 Alternatives? How about hard-coding wires and switches…? This
is how ENIAC was programmed!

 How to branch?
 BEQ rs, rt, Imm16

55:035 Computer Architecture and Organization 20


Instruction Processing
 Fetch instruction
 Execute instruction

 Fetch next instruction


 Execute next instruction

 Fetch next instruction


 Execute next instruction

 Etc…

 How to maintain sequence? Use a counter!


 Branches (out of sequence) ? Load the counter!

55:035 Computer Architecture and Organization 21


Instruction Processing
 Program Counter
 Points to current instruction

 Address to instruction memory


 Instr ← InstrMem[PC]

 Next instruction: counts up by 4


 Remember: memory is byte-addressable, instructions are 4 bytes
 PC ← PC + 4

 Branch instruction: replace PC contents

55:035 Computer Architecture and Organization 22


Step 1: Analyze Instructions
 Register Transfer Language…
op | rs | rt | rd | shamt | funct = InstrMem[ PC ]
op | rs | rt | Imm16 = InstrMem[ PC ]

Instr Register Transfers

ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4

SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4

ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4

LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4

STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4

BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ }


else
PC ← PC + 4
55:035 Computer Architecture and Organization 23
Steps 2 and 3: Datapath & Assembly

Add
4

PC Read
address

Instruction Instruction[31:0]
[31:0]
Instruction
Memory

 PC: a register
 Counter, counts by +4
 Provides address to Instruction Memory
55:035 Computer Architecture and Organization 24
Steps 2 and 3: Datapath & Assembly

0
M
u
Add Add x
4 Add 1
result
Shift
Left 2 PCSrc

Instruction[25:21]
PC Read
address
Instruction[20:16]
Instruction
[31:0]
Instruction Instruction[15:11]
Memory
PC: a register
 Counter, counts by +4
Instruction[15:0] (Imm16)
Sign/  Sometimes, must add
Zero
16 Extend 32
SignExtend{Imm16||b’00’} for
Note: the sign-extender for Imm16 branch instructions
is already in the datapath
ExtOp
(everything else is new) 25
Steps 2 and 3: Add Previous Datapath

0
M
u
Add Add x
4 Add 1
result
Shift
RegWrite PCSrc
Left 2

Instruction[25:21]
Read Read
PC reg. 1 Read
address
Instruction[20:16] data 1 MemtoReg
Read ALUSrc ALU Zero
Instruction reg. 2
[31:0] 0
Read ALU Read
M
Write 0 result Addr- 1
Instruction u data 2 M ess data M
Instruction[15:11] x reg. u u
Memory 1 x x
Write Register 1 0
data File
RegDst Write
data Data
Memory
Instruction[15:0] (Imm16)
Sign/
Zero ALU
16
Extend 32 Control
MemWrite

Instruction[5:0] (funct) ExtOp


ALUOp
What have we done?
 Created a simple CPU datapath
 Control still missing (next slide)

 Single-cycle CPU
 Every instruction takes 1 clock cycle
 Clocking ?

55:035 Computer Architecture and Organization 27


One Clock Cycle
 Clock Locations
 PC, REGFILE have clocks

 Operation
 On rising edge, PC will get new value
 Maybe REGFILE will have one value updated as well
 After rising edge
 PC and REGFILE can’t change
 New value out of PC
 Instruction out of INSTRMEM
 Instruction selects registers to read from REGFILE
 Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc Lots to do


ALU does its work
DataMem may be read (depending on instruction)
in only
 Result value goes back to REGFILE 1 clock
 New PC value goes back to PC
 Await next clock edge cycle !!

55:035 Computer Architecture and Organization 28


Missing Steps?
 Control is missing (Steps 4 and 5 we mentioned earlier)
 Generate the green signals
 ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc
 These are all f(Instruction), where f() is a logic expression
 Will look at control strategies in upcoming lecture

 Implementation Details
 How to implement REGFILE?
 Read port: tristate buffers? Multiplexer? Memory?
 Two read ports: two of above?
 Write port: how to write only 1 register?
 How to control writes to memory? To register file?

 More instructions
 Shift instructions
 Jump instruction
 Etc

55:035 Computer Architecture and Organization 29


1-Cycle CPU Datapath
0
M
u
Add Add x
4 Add 1
result
Shift
RegWrite PCSrc
Left 2

Instruction[25:21]
Read Read
PC reg. 1 Read
address
Instruction[20:16] data 1 MemtoReg
Read ALUSrc ALU Zero
Instruction reg. 2
[31:0] 0
Read ALU Read
M
Write 0 result Addr- 1
Instruction u data 2 M ess data M
Instruction[15:11] x reg. u u
Memory 1 x x
Write Register 1 0
data File
RegDst Write
data Data
Memory
Sign/
Instruction[15:0] (Imm16) ALU
Zero
16 Extend 32 Control
MemWrite

Instruction[5:0] (funct) ExtOp


ALUOp
1-cycle CPU Datapath + Control

Add Add
4 Add
result
RegDst
Shift PCSrc
Left 2
Branch
Instruction MemRead
[31:26] Con- MemtoReg
trol ALUOp
MemWrite
ALUSrc
RegWrite

Instruction[25:21] Read
PC Read reg. 1 Read
address data 1
Instruction[20:16]
Read Zero
Instruction
[31:0]
reg. 2 ALU Read
Read ALU Addr-
Write data
Instruction data 2 result ess
Instruction[15:11] reg.
Memory Register Data
Write File Memory
data
Write
data
Sign/
Instruction[15:0] ALU
Zero
Extend control

Instruction[5:0]
1-cycle CPU Control – Lookup Table
Input or Output Signal Name R-format Lw Sw Beq
Op5 0 1 1 0
Op4 0 0 0 0
Op3 0 0 1 0
Inputs Op2 0 0 0 1
Op1 0 1 1 0
Op0 0 1 1 0
RegDst 1 0 X X
ALUSrc 0 1 1 0
MemtoReg 0 1 X X
RegWrite 1 1 0 0

Outputs MemRead 0 1 0 0
MemWrite 0 0 1 0
Branch 0 0 0 1
ALUOp1 1 0 0 0
ALUOp0 0 0 0 1

 Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.


1-cycle CPU + Jump Instruction
Instruction[25:0] Jump address [31..0]

PC + 4 [31..28]

Instruction
[31:26]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]
1-cycle CPU Problems?
 Every instruction 1 cycle
 Some instructions “do more work”
 Eg, lw must read from DATAMEM
 All instructions must have same clock period…

 Many instructions run slower than necessary

 Tricky timing on MemWrite, RegWrite(?) signals


 Write signal must come *after* address is stable

 Need extra resources…


 PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM

55:035 Computer Architecture and Organization 34


Performance!
 Single-Cycle CPU Performance
 Execute one instruction per clock cycle (CPI=1)
 Clock cycle time? Note dataflow includes:
 INSTRMEM read
 REGFILE access
 Sign extension
 ALU operation
 DATAMEM read
 REGFILE/PC write
 Not every instruction uses all resources (eg, DATAMEM read)
 Can we change clock period for each instruction?
 No! (Why not?)
 One clock period: the worst case!
 This is why a single-cycle CPU is not good for performance

55:035 Computer Architecture and Organization 35


1-cycle CPU Datapath + Controller
Instruction[25:0] Jump address [31..0]

PC + 4 [31..28]

Instruction
[31:26]

Instruction[25:21]

Instruction[20:16]

Instruction[15:11]

Instruction[15:0]

Instruction[5:0]
1-cycle CPU Summary
 Operation
 1 cycle per instruction
 Control signals held fixed during entire cycle (except BRANCH)
 Only 2 registers
 PC, updated every clock cycle
 REGFILE, updated when required
 During clock cycle, data flows from register-outputs to register-inputs
 Fixed clock frequency / period

 Performance
 1 instruction per cycle
 Slowest instruction determines clock frequency

 Outstanding issue: MemWrite timing


 Assume this signal writes to memory at end of clock cycle

55:035 Computer Architecture and Organization 37


Multi-cycle CPU Goals
 Improve performance
 Break each instruction into smaller steps / multiple cycles
 LW instruction  5 cycles
 SW instruction  4 cycles
 R-type instruction  4 cycles
 Branch, Jump  3 cycles
 Aim for 5x clock frequency
 Complex instructions (eg, LW)  5 cycles  same performance as before
 Simple instructions (eg, ADD)  fewer cycles  faster

 Save resources (gates/transistors)


 Re-use ALU over multiple cycles
 Put INSTR + DATA in same memory

 MemWrite timing solved?

55:035 Computer Architecture and Organization 38


Multi-cycle CPU Datapath

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

 Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB)


 Move signal paths (+4, Shift Left 2)
Multi-cycle CPU Datapath

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

 Add registers + control signals (IR, MDR, A, B, ALUOut)


 Registers with no control signal load value every clock cycle (eg, PC)
Instruction Execution Example
 Execute a “Load Word” instruction
 LW rt, 0(rs)

 5 Steps
1. Fetch instruction
2. Read registers
3. Compute address
4. Read data
5. Write registers

55:035 Computer Architecture and Organization 41


Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

1. Fetch Instruction
InstructionRegister ← Mem[PC]
Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

2. Read Registers
A ← Registers[Rs]
Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

3. Compute Address
ALUOut ← A + {SignExt(Imm16),b’00’}
Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

4. Read Data
MDR ← Memory[ALUOut]
Load Word Instruction Sequence

PC
M Instruction
RdReg1 M
u Address [25:21]
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

5. Write Registers
Registers[Rt] ← MDR
Load Word Instruction Sequence

PC
M Instruction RdReg1
u Address [25:21] M
x RdData1 A u
Instruction x
Memory [20:16] RdReg2 ALU Zero
MemData Instruction Registers ALU ALU
M
[15:0] Instruction u Write result Out
Write [15:11] x reg RdData2 B
Instruction M
data 4 u
Register Write x
M data
Instr[15:0] u
x

Memory Sign Shift


Data Extend Left 2
Register

Instruction[5:0]

All 5 Steps Shown


Multi-cycle Load Word: Recap
1. Fetch Instruction InstructionRegister ← Mem[PC]

2. Read RegistersA ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

 Missing Steps?

55:035 Computer Architecture and Organization 48


Multi-cycle Load Word: Recap
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read RegistersA ← Registers[Rs]

3. Compute Address ALUOut ← A + {SignExt(Imm16)}

4. Read Data MDR ← Memory[ALUOut]

5. Write Registers Registers[Rt] ← MDR

 Missing Steps?
 Must increment the PC
 Do it as part of the instruction fetch (in step 1)
 Need PCWrite control signal

55:035 Computer Architecture and Organization 49


Multi-cycle R-Type Instruction
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]

3. Compute Value ALUOut ← A op B

4. Write Registers Registers[Rd] ← ALUOut

 RTL describes data flow action in each clock cycle


 Control signals determine precise data flow
 Each step implies unique control values

55:035 Computer Architecture and Organization 50


Multi-cycle R-Type Instruction:
Control Signal Values
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
MemRead=1, ALUSrcA=0, IorD=0, IRWrite,
ALUSrcB=01, ALUop=00, PCWrite, PCSource=00

2. Read Registers A ← Registers[Rs]; B ← Registers[Rt]


ALUSrcA=0, ALUSrcB=11, ALUop=00

3. Compute Value ALUOut ← A op B


ALUSrcA=1, ALUSrcB=00, ALUop=10

4. Write Registers Registers[Rd] ← ALUOut


RegDst=1, RegWrite, MemtoReg=0

 Each step implies unique control values


 Fixed for entire cycle
 “Default value” implied if unspecified

55:035 Computer Architecture and Organization 51


Check Your Work – Is RTL Valid ?
1. Datapath check
 Within one cycle…
 Each cycle has valid data flow path (path exists)
 Each register gets only one new value
 Across multiple cycles…
 Register value is defined before use in previous (earlier in time) clock cycle
 Eg, “A  3” must occur before “B  A”
 Make sure register value doesn’t disappear if set >1 cycle earlier

2. Control signal check


 Each cycle, RTL describing the datapath flow implies a value for each control
signal
 0 or 1 or default or don’t care
 Each control signal gets only one fixed value the entire cycle

3. Overall check
 Does the sequence of steps work ?

55:035 Computer Architecture and Organization 52


Multi-cycle BEQ Instruction
1. Fetch Instruction
InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute Target


A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branch


if( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

55:035 Computer Architecture and Organization 53


Multi-cycle Datapath with Control Signals
PCSrc
PCWrite IRWrite

IorD RegWrite ALUSrcA


Jump
MemRead address
[31..0]
Instr[25:0]
RegDst PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

ALU
MemWrite
Control

MemtoReg ALUSrcB
Instruction[5:0]
ALUOp

55:035 Computer Architecture and Organization 54


Multi-cycle Datapath with Controller

Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

Instruction[5:0]
Multi-cycle BEQ Instruction
1. Fetch Instruction
InstructionRegister ← Mem[PC]; PC ← PC + 4

2. Read Registers, Precompute Target


A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}

3. Compare Registers, Conditional Branch


if( (A – B) ==0 ) PC ← ALUOut

Green shows PC calculation flow (in parallel with other operations)

55:035 Computer Architecture and Organization 56


Multi-cycle Datapath with Control Signals
PCSrc
PCWrite IRWrite

IorD RegWrite ALUSrcA


Jump
MemRead address
[31..0]
Instr[25:0]
RegDst PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

ALU
MemWrite
Control

MemtoReg ALUSrcB
Instruction[5:0]
ALUOp

55:035 Computer Architecture and Organization 57


Multi-cycle Datapath with Controller

Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

Instruction[5:0]
Multi-cycle CPU Control: Overview

Control
Signal
Outputs

Control
Signal
Outputs

 General approach: Finite State Machine (FSM)


 Need details in each branch of control…
 Precise outputs for each state (Mealy depends on inputs, Moore does not)
 Precise “next state” for each state (can depend on inputs)

55:035 Computer Architecture and Organization 59


How to Implement FSM ?
 Manually with logic gates + FFs
 Bubble diagram, next-state table, state assignment
 Karnaugh map for each state bit, each output bit (painful!)

 High-level language description (eg, Verilog, VHDL)


 Describe FSM bubble diagram (next-states, output values)
 Automatically synthesized into gates + FFs

 Microcode (µ-code) description


 Sequence through many µ-ops for each CPU instruction
 One µ-op (µ-instruction) sends correct control signal for 1 cycle
 µ-op similar to one bubble in FSM
 Acts like a mini-CPU within a CPU
 µPC: microcode program counter
 Microcode storage memory contains µ-ops
 Can look similar to RTL or some new “assembly language”

55:035 Computer Architecture and Organization 60


FSM Specification: Bubble Diagram
Can build this
by examining
RTL

It is possible to
automatically
convert RTL
into this form !

61
FSM: Gates + FFs Implementation

FSM
High-level
Organization

55:035 Computer Architecture and Organization 62


FSM: Microcode Implementation
Microcode
Storage
(memory) Datapath
control
Outputs
outputs

Inputs
1

Sequencing
Microprogram Counter control
Adder

Address Select Logic

Inputs from instruction


register opcode field

55:035 Computer Architecture and Organization 63


Multi-cycle CPU with Control FSM
Conditional
Branch
FSM
Control
Outputs
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]

Instr[20:16]

Instr[15:0]
In[15:11]

Instr[15:0]

Instruction[5:0]
Control FSM: Overview

 General approach: Finite State Machine (FSM)


 Need details in each branch of control…

55:035 Computer Architecture and Organization 65


Detailed FSM

66
Detailed FSM
Instruction
Fetch

R-Type Branch Jump


Memory
Reference

67
Detailed FSM: Instruction Fetch

55:035 Computer Architecture and Organization 68


Detailed FSM: Memory Reference

LW SW

69
Detailed FSM: R-Type Instruction

55:035 Computer Architecture and Organization 70


Detailed FSM: Branch Instruction

55:035 Computer Architecture and Organization 71


Detailed FSM: Jump Instruction

55:035 Computer Architecture and Organization 72


Performance Comparison

Single-cycle CPU
vs
Multi-cycle CPU

55:035 Computer Architecture and Organization 73


Simple Comparison
1 clock cycle
Single-cycle CPU All

5 clock cycles
Multi-cycle CPU LW

4 clock cycles
Multi-cycle CPU SW, R-type

3 clock cycles
Multi-cycle CPU BEQ, J
What’s really happening?
Single-cycle CPU

Ideally:
Calc
Fetch Decode Memory Write
Addr
( Load Word Instruction )

Multi-cycle CPU

55:035 Computer Architecture and Organization 75


In practice, steps differ in speeds…
Load Word Instruction

Single-cycle CPU
Calc
Fetch Decode Memory Write
Addr

Wasted time! Violation!

Multi-cycle CPU
Calc
Fetch Decode Memory
Addr Write
55:035 Computer Architecture and Organization 76
Single-cycle vs Multi-cycle
LW instruction faster for single-cycle
Single-cycle CPU
Calc
Fetch Decode Memory Write
Addr

Now wasted time is larger! Violation fixed!

Multi-cycle CPU
Calc
Fetch Decode Memory Write
Addr
55:035 Computer Architecture and Organization 77
Single-cycle vs Multi-cycle
SW instruction ~ same speed
Single-cycle CPU
Calc
Fetch Decode Memory
Addr
Speed diff

Wasted time!

Multi-cycle CPU
Calc
Fetch Decode Memory
Addr
55:035 Computer Architecture and Organization 78
Single-cycle vs Multi-cycle
BEQ, J instruction faster for multi-cycle
Single-cycle CPU
Calc
Fetch Decode
Addr
Speed diff

Wasted time!

Multi-cycle CPU
Calc
Fetch Decode
Addr
55:035 Computer Architecture and Organization 79
Performance Summary
 Which CPU implementation is faster?
 LW  single-cycle is faster
 SW,R-type  about the same
 BEQ,J  multi-cycle is faster

 Real programs use a mix of these instructions

 Overall performance depends instruction frequency !

55:035 Computer Architecture and Organization 80


Implementation Summary
 Single-cycle CPU
 1 instruction per cycle (eg, 1MHz  1 MIPS)
 No “wasted time” on most complex instruction
 Large wasted time on simpler instructions
 Simple controller (just a lookup table or memory)
 Simple instructions

 Multi-cycle CPU
 << 1 instruction per cycle (eg, 1MHz  0.2 MIPS)
 Small time wasted on most complex instruction
 Hence, this instruction always slower than single-cycle CPU
 Small time wasted on simple instructions
 Eliminates “large wasted time” by using fewer clock cycles
 Complex controller (FSM)
 Potential to create complex instructions

55:035 Computer Architecture and Organization 81

You might also like