+
Chapter 14 Structure and Processor
Function
William Stallings, Computer Organization and Architecture, 9 Edition
th
+ 2
Objectives
After studying this chapter, you should be able to:
Distinguish between user-visible and control/status
registers, and discuss the purposes of registers in
each category.
Summarize the instruction cycle.
Discuss
the principle behind instruction pipelining
and how it works in practice.
Compare and contrast the various forms of pipeline
hazards (rủi ro).
+ 3
Contents
14.1 Processor Organization
14.2 Register Organization
14.3 Instruction Cycle
14.4 Instruction Pipelining
+ 4
14.1- Processor Organization
Processor Requirements:
Fetch instruction (from memory (register, cache, main memory))
Interpret instruction (what action is required)
Fetch data (data from memory or an I/O module)
Process data (performing some operations on data)
Write data (writing result to memory or an I/O module)
In order to do these things the processor needs to store some data
temporarily and therefore needs a small internal memory
CPU With the System Bus and 5
CPU Internal Structure
+ 14.2- Register Organization 6
Withinthe processor there is a set of registers that
function as a level of memory above main memory
and cache in the hierarchy
The registers in the processor perform two roles:
User-Visible Registers Control and Status Registers
Enable the machine or Used by the control unit
assembly language to control the operation
programmer to of the processor and by
minimize main memory privileged operating
references by system programs to
optimizing use of control the execution of
registers programs
7
User-Visible Registers
Categories:
Referenced by means of • General purpose
the machine language
• Can be assigned to a variety of functions by the
programmer
that the processor • Data
• May be used only to hold data and cannot be
executes employed in the calculation of an operand address
• Address
• May be somewhat general purpose or may be
devoted to a particular addressing mode
• Examples: segment pointers, index registers, stack
pointer
• Condition codes
• Also referred to as flags
• Bits set by the processor hardware as the result of
operations
Table 14.1: Condition Codes
8
+ 9
Control and Status Registers
Four registers are essential to instruction execution:
Program counter (PC)
Contains the address of an instruction to be fetched
Instruction register (IR)
Contains the instruction most recently fetched
Memory address register (MAR)
Contains the address of a location in memory
Memory buffer register (MBR)
Contains a word of data to be written to memory or the word most
recently read
+ Program Status Word (PSW) 10
Register or set of registers
that contain status
information
Common fields or flags
include: Status
• Sign information are
• Zero used to give a
• Carry decision for
• Equal
branching
• Overflow
• Interrupt Enable/Disable
• Supervisor
11
Example
Microprocessor
Register
Organizations
14.3- 12
Instruction Includes the
Cycle following
stages:
Fetch Execute Interrupt
If interrupts are enabled
Read the next and an interrupt has
Interpret the opcode
instruction from occurred, save the
and perform the
memory into the current process state
indicated operation
processor and service the
interrupt
Instruction Cycle 13
Loop due to
additional memory
accesses
14
Instruction Cycle State Diagram
Fetch cycle Indirect cycle Interrupt cycle
Data Flow, Fetch Cycle 15
Fetch cycle for the next The CU examines the contents of
instruction the IR to determine if it contains an
(Instruction index is in PC) operand specified by indirect
MAR: Memory Address Register addressing Use indirect
MBR: Memory buffer Register cycle(data address is in MBR)
Data Flow, Interrupt Cycle 16
(1) Store PC (return point after executing interrupt routine)
(2) Store current state (values in registers before running interrupt routine)
(3) Fetch cycle is used to load interrupt routine
A way to improve17
14.4- Instruction Pipelining performance is
Pipelining Strategy performing jobs in
parallel manner
To apply this concept
to instruction
execution we must
Similar to the use of recognize that an
an assembly line in a instruction has a
manufacturing plant number of stages
New inputs are
accepted at one end
before previously An assembly line (dây
accepted inputs
appear as outputs at chuyền lắp ráp) in which
the other end some operations are
performed
concurrently
Two-Stage Instruction Pipeline 18
+ Additional Stages 19
Fetch instruction (FI) Fetch operands (FO)
Read the next expected Fetch each operand from
instruction into a buffer memory
Operands in registers need
Decode instruction (DI) not be fetched
Determine the opcode and
the operand specifiers Execute instruction (EI)
Perform the indicated
Calculate operands (CO) operation and store the
Calculate the effective result, if any, in the
address of each source specified destination
operand operand location
This may involve
Write operand (WO)
displacement, register
indirect, indirect, or other Store the result in memory
forms of address calculation
Timing Diagram for Instruction Pipeline 20
Operation
I: Instruction
O: operand
F: Fetch
C: Calculate
E: Execute
W: Write
The Effect of a Conditional Branch 21
on Instruction Pipeline Operation
At the time 7, the
instruction 3 executes
and the instruction 15
is loaded.
These jobs are wasted
Suppose that the
instruction 3 is a
branch to the
instruction 15
+
Six Stage
Instruction
Pipeline
Figure 14.12 indicates
the logic needed for
pipelining to account
for branches and
interrupts
+ Alternative
Pipeline
Depiction
I3 is a
conditional
branch to
I15
+ number of
Speedup
instructions that are
executed without a
Factors branch
with
Instruction
Pipelining
The larger the
number of
pipeline stages,
the greater the
potential for
speedup
higher COST
Pipeline Hazards (rủi ro) 25
Occur when the
pipeline, or some There are three types
portion of the of hazards:
pipeline, must stall
because conditions • Resource
do not permit • Data
continued execution • Control
Also referred to as a
pipeline bubble
+Resource
Hazards
A resource hazard occurs
when two or more
instructions that are
already in the pipeline
need the same
resource
The result is that the
instructions must be
executed in serial rather
than parallel for a portion
of the pipeline
A resource hazard is
sometimes referred to as a FO is accessing memory. So, this step is idle
structural hazard
Data Hazards 27
A data hazard occurs when there is a conflict in RAW
the access of an operand location
Instruction is executing and the
register EAX is writing to. So, it
can not be read.
X86
Hazard
instruction
+
+ Types of Data Hazard 28
Read after write (RAW), or true dependency
An instruction modifies a register or memory location
Succeeding instruction reads data in memory or register location
Hazard occurs if the read takes place before write operation is complete
Write after read (WAR), or antidependency
An instruction reads a register or memory location
Succeeding instruction writes to the location
Hazard occurs if the write operation completes before the read operation
takes place
Write after write (WAW), or output dependency
Two instructions both write to the same location
Hazard occurs if the write operations take place in the reverse order of the
intended sequence
+
Control Hazard
29
Also known as a branch hazard
Occurswhen the pipeline makes the wrong decision
on a branch prediction
Brings
instructions into the pipeline that must
subsequently be discarded
Dealing with Branches:
Multiple streams
Prefetch branch target
Loop buffer
Branch prediction
Delayed branch
Multiple Streams 30
A simple pipeline suffers a penalty for a
branch instruction because it must
choose one of two instructions to fetch
next and may make the wrong choice
A brute-force approach is to replicate the
initial portions of the pipeline and allow brute-force search or exhaustive
the pipeline to fetch both instructions, search (vét cạn)
making use of two streams
Drawbacks:
• With multiple pipelines there are contention delays for
access to the registers and to memory
• Additional branch instructions may enter the pipeline
before the original branch decision is resolved
31
Prefetch Branch Target
When a conditional branch is
recognized, the target of the branch is
prefetched, in addition to the
instruction following the branch
Target is then saved until the branch
instruction is executed
Ifthe branch is taken, the target has
+ already been prefetched
IBM 360/91 uses this approach
+ Loop Buffer 32
Small, very-high speed memory
maintained by the instruction
fetch stage of the pipeline and
containing the n most recently
fetched instructions, in sequence
Benefits:
Similar in principle to
Instructions fetched in sequence will be
a cache dedicated to
available without the usual memory
access time instructions. Differences:
•The loop buffer only
If a branch occurs to a target just a few
retains instructions in
locations ahead of the address of the
sequence
branch instruction, the target will
•Is much smaller in size
already be in the buffer
and hence lower in cost
This strategy is particularly well suited to
dealing with loops
+ 33
Branch Prediction
Various
techniques can be used to predict whether a
branch will be taken:
These approaches are static
1. Predict never taken
They do not depend on the execution
2. Predict always taken history up to the time of the
3. Predict by opcode conditional branch instruction
1. Taken/not taken switch These approaches are dynamic
2. Branch history table They depend on the execution history
How are predictions carried States of some last instructions (some
out? bits) must be stores in cache
Next slide
+Branch
Prediction
Flow Chart
If only one bit is stored,
a loop may cause 2
errors in prediction:
once on entering and
once on exiting.
If 2 bits are stored, a
prediction algorithm is
carried out using 2
branches (fig. 14.18)
Branch Prediction State Diagram 35
The decision process
can be represented
more compactly by a
finite-state machine
Finite-state machine is a
way to express a
processing mechanism
in which each part of
input will determine a
step of the process.
Some bits are stored: 0: Not taken, 1: Taken. A history can be as
01110
+Dealing
With
Branches
Each prefetch triggers a
lookup in the table.
No match: Fetch next branch history table
sequential address.
Match: a prediction is
made based on the state of
the instruction: Either the
next sequential address or
the branch target address is
fed to the select logic.
+ Delayed Branch 37
It is possible to improve pipeline performance
by automatically rearranging instructions
within a program, so that branch instructions
occur later than actually desired. This
intriguing approach is examined in Chapter 15.
+ Intel 80486 Pipelining 38
Fetch
Objective is to fill the prefetch buffers with new data as soon as the old data
have been consumed by the instruction decoder
Operates independently of the other stages to keep the prefetch buffers full
Decode stage 1
All opcode and addressing-mode information is decoded in the D1 stage
3 bytes of instruction are passed to the D1 stage from the prefetch buffers
D1 decoder can then direct the D2 stage to capture the rest of the instruction
Decode stage 2
Expands each opcode into control signals for the ALU
Also controls the computation of the more complex addressing modes
Execute
Stage includes ALU operations, cache access, and register update
Write back
Updates registers and status flags modified during the preceding execute
stage
+ 80486
Instructio
n
Pipeline
Examples
+ 40
Exercises
14.1 What general roles are performed by processor registers?
14.2 What categories of data are commonly supported by user-
visible registers?
14.3 What is the function of condition codes?
14.4 What is a program status word?
14.5 Why is a two-stage instruction pipeline unlikely to cut the
instruction cycle time in half, compared with the use of no
pipeline?
14.6 List and briefly explain various ways in which an
instruction pipeline can deal with conditional branch
instructions.
14.7 How are history bits used for branch prediction?
+ 41
Exercises
14.8 - what would be the value of the following flags: Carry, Zero,
Overflow, Sign, Even Parity , Half-Carry ?
(a) If the last operation performed on a computer with an 8-bit
word was an addition in which the two operands were
00000010 and 00000011.
(b) Repeat for the addition of -1 (twos complement) and +1.
(c) A - B, where A contains 11110000 and B contains 0010100.
+ Summary
42
Processor Structure
and Function
Chapter 14
Instructionpipelining
Processor organization
Pipelining strategy
Registerorganization Pipeline performance
User-visible registers Pipeline hazards
Control and status Dealing with branches
registers
Intel 80486
Instructioncycle pipelining
The indirect cycle
Data flow