UNIT 2.1 CPU Building Blocks Reg Org
UNIT 2.1 CPU Building Blocks Reg Org
Processor Structure
and Function
14.1 Processor Organization
14.2 Register Organization
User-Visible Registers
Control and Status Registers
Example Microprocessor Register Organizations
14.3 Instruction Cycle
The Indirect Cycle
Data Flow
14.4 Instruction Pipelining
Pipelining Strategy
Pipeline Performance
Pipeline Hazards
Dealing with Branches
Intel 80486 Pipelining
14.5 The x86 Processor Family
Register Organization
Interrupt Processing
14.6 The Arm Processor
Processor Organization
Processor Modes
Register Organization
Interrupt Processing
14.7 Key Terms, Review Questions, and Problems
488
14.1 / Processor Organization 489
Learning Objectives
After studying this chapter, you should be able to:
rr Distinguish between user-visible and control/status registers, and discuss the
purposes of registers in each category.
rr Summarize the instruction cycle.
rr Discuss the principle behind instruction pipelining and how it works in
practice.
rr Compare and contrast the various forms of pipeline hazards.
rr Present an overview of the x86 processor structure.
rr Present an overview of the ARM processor structure.
This chapter discusses aspects of the processor not yet covered in Part Three and sets
the stage for the discussion of RISC and superscalar architecture in Chapters 15 and 16.
We begin with a summary of processor organization. Registers, which form
the internal memory of the processor, are then analyzed. We are then in a position
to return to the discussion (begun in Section 3.2) of the instruction cycle. A descrip-
tion of the instruction cycle and a common technique known as instruction pipelin-
ing complete our description. The chapter concludes with an examination of some
aspects of the x86 and ARM organizations.
Registers
ALU
Control
unit
System
bus
Figure 14.1 The CPU with the System Bus
of the interconnection structures described in Chapter 3. The reader will recall that
the major components of the processor are an arithmetic and logic unit (ALU) and
a control unit (CU). The ALU does the actual computation or processing of data.
The control unit controls the movement of data and instructions into and out of the
processor and controls the operation of the ALU. In addition, the figure shows a
minimal internal memory, consisting of a set of storage locations, called registers.
Figure 14.2 is a slightly more detailed view of the processor. The data trans-
fer and logic control paths are indicated, including an element labeled internal
Status flags
• Registers
•
Shifter •
Internal CPU bus
Complementer
Arithmetic
and
Boolean
logic
Control
unit
Control
paths
Figure 14.2 Internal Structure of the CPU
14.2 / Register Organization 491
processor bus. This element is needed to transfer data between the various registers
and the ALU because the ALU in fact operates only on data in the internal pro-
cessor memory. The figure also shows typical basic elements of the ALU. Note the
similarity between the internal structure of the computer as a whole and the internal
structure of the processor. In both cases, there is a small collection of major ele-
ments (computer: processor, I/O, memory; processor: control unit, ALU, registers)
connected by data paths.
User-Visible Registers
A user-visible register is one that may be referenced by means of the machine
language that the processor executes. We can characterize these in the following
categories:
■■ General purpose
■■ Data
■■ Address
■■ Condition codes
General-purpose registers can be assigned to a variety of functions by the pro-
grammer. Sometimes their use within the instruction set is orthogonal to the oper-
ation. That is, any general-purpose register can contain the operand for any opcode.
This provides true g eneral-purpose register use. Often, however, there are restrictions.
For example, there may be dedicated registers for floating-point and stack operations.
In some cases, general-purpose registers can be used for addressing functions
(e.g., register indirect, displacement). In other cases, there is a partial or clean sep-
aration between data registers and address registers. Data registers may be used
only to hold data and cannot be employed in the calculation of an operand address.
492 CHAPTER 14 / Processor Structure and Function
In some machines, a subroutine call will result in the automatic saving of all
ser-visible registers, to be restored on return. The processor performs the saving
u
and restoring as part of the execution of call and return instructions. This allows
each subroutine to use the user-visible registers independently. On other machines,
it is the responsibility of the programmer to save the contents of the relevant user-
visible registers prior to a subroutine call, by including instructions for this purpose
in the program.
to the system bus are staged and the bits to be read from the data bus are temporar-
ily stored.
Typically, the processor updates the PC after each instruction fetch so that the
PC always points to the next instruction to be executed. A branch or skip instruc-
tion will also modify the contents of the PC. The fetched instruction is loaded into
an IR, where the opcode and operand specifiers are analyzed. Data are exchanged
with memory using the MAR and MBR. In a b us-organized system, the MAR con-
nects directly to the address bus, and the MBR connects directly to the data bus.
User-visible registers, in turn, exchange data with the MBR.
The four registers just mentioned are used for the movement of data between
the processor and memory. Within the processor, data must be presented to the
ALU for processing. The ALU may have direct access to the MBR and u ser-visible
registers. Alternatively, there may be additional buffering registers at the boundary
to the ALU; these registers serve as input and output registers for the ALU and
exchange data with the MBR and user-visible registers.
Many processor designs include a register or set of registers, often known as
the program status word (PSW), that contain status information. The PSW typic-
ally contains condition codes plus other status information. Common fields or flags
include the following:
■■ Sign: Contains the sign bit of the result of the last arithmetic operation.
■■ Zero: Set when the result is 0.
■■ Carry: Set if an operation resulted in a carry (addition) into or borrow (sub-
traction) out of a high-order bit. Used for multiword arithmetic operations.
■■ Equal: Set if a logical compare result is equality.
■■ Overflow: Used to indicate arithmetic overflow.
■■ Interrupt Enable/Disable: Used to enable or disable interrupts.
■■ Supervisor: Indicates whether the processor is executing in supervisor or
user mode. Certain privileged instructions can be executed only in supervisor
mode, and certain areas of memory can be accessed only in supervisor mode.
A number of other registers related to status and control might be found in a
particular processor design. There may be a pointer to a block of memory contain-
ing additional status information (e.g., process control blocks). In machines using
vectored interrupts, an interrupt vector register may be provided. If a stack is used
to implement certain functions (e.g., subroutine call), then a system stack pointer is
needed. A page table pointer is used with a virtual memory system. Finally, regis-
ters may be used in the control of I/O operations.
A number of factors go into the design of the control and status register organ-
ization. One key issue is operating system support. Certain types of control infor-
mation are of specific utility to the operating system. If the processor designer has
a functional understanding of the operating system to be used, then the register
organization can to some extent be tailored to the operating system.
Another key design decision is the allocation of control information between
registers and memory. It is common to dedicate the first (lowest) few hundred or
14.2 / Register Organization 495
(a) MC68000
Figure 14.3 Example Microprocessor Register Organizations
thousand words of memory for control purposes. The designer must decide how
much control information should be in registers and how much in memory. The
usual trade-off of cost versus speed arises.
two functional components, saving one bit on each register specifier. This seems a
reasonable compromise between complete generality and code compaction.
The Intel 8086 takes a different approach to register organization. Every
register is special purpose, although some registers are also usable as general pur-
pose. The 8086 contains four 16-bit data registers that are addressable on a byte
or 16-bit basis, and four 16-bit pointer and index registers. The data registers can
be used as general purpose in some instructions. In others, the registers are used
implicitly. For example, a multiply instruction always uses the accumulator. The
four pointer registers are also used implicitly in a number of operations; each
contains a segment offset. There are also four 16-bit segment registers. Three of
the four segment registers are used in a dedicated, implicit fashion, to point to
the segment of the current instruction (useful for branch instructions), a segment
containing data, and a segment containing a stack, respectively. These dedicated
and implicit uses provide for compact encoding at the cost of reduced flexibility.
The 8086 also includes an instruction pointer and a set of 1-bit status and control
flags.
The point of this comparison should be clear. There is no universally accepted
philosophy concerning the best way to organize processor registers [TOON81]. As
with overall instruction set design and so many other processor design issues, it is
still a matter of judgment and taste.
A second instructive point concerning register organization design is illus-
trated in Figure 14.3c. This figure shows the u ser-visible register organization for
the Intel 80386 [ELAY85], which is a 32-bit microprocessor designed as an exten-
sion of the 8086.1 The 80386 uses 32-bit registers. However, to provide upward
compatibility for programs written on the earlier machine, the 80386 retains the
original register organization embedded in the new organization. Given this design
constraint, the architects of the 32-bit processors had limited flexibility in designing
the register organization.
In Section 3.2, we described the processor’s instruction cycle (Figure 3.9). To recall,
an instruction cycle includes the following stages:
■■ Fetch: Read the next instruction from memory into the processor.
■■ Execute: Interpret the opcode and perform the indicated operation.
■■ Interrupt: If interrupts are enabled and an interrupt has occurred, save the
current process state and service the interrupt.
We are now in a position to elaborate somewhat on the instruction cycle. First,
we must introduce one additional stage, known as the indirect cycle.
1
Because the MC68000 already uses 32-bit registers, the MC68020 [MACD84], which is a full 32-bit archi-
tecture, uses the same register organization.
14.3 / Instruction Cycle 497
Data Flow
The exact sequence of events during an instruction cycle depends on the design of
the processor. We can, however, indicate in general terms what must happen. Let us
assume that a processor that employs a memory address register (MAR), a memory
buffer register (MBR), a program counter (PC), and an instruction register (IR).
During the fetch cycle, an instruction is read from memory. Figure 14.6 shows
the flow of data during this cycle. The PC contains the address of the next instruc-
tion to be fetched. This address is moved to the MAR and placed on the address
bus. The control unit requests a memory read, and the result is placed on the data
bus and copied into the MBR and then moved to the IR. Meanwhile, the PC is
incremented by 1, preparatory for the next fetch.
Once the fetch cycle is over, the control unit examines the contents of the IR
to determine if it contains an operand specifier using indirect addressing. If so, an
Fetch
Interrupt Indirect
Execute
Multiple Multiple
operands results
Interrupt
Interrupt
CPU
PC MAR
Memory
Control
unit
IR MBR
indirect cycle is performed. As shown in Figure 14.7, this is a simple cycle. The right-
most N bits of the MBR, which contain the address reference, are transferred to
the MAR. Then the control unit requests a memory read, to get the desired address
of the operand into the MBR.
The fetch and indirect cycles are simple and predictable. The execute cycle
takes many forms; the form depends on which of the various machine instructions
is in the IR. This cycle may involve transferring data among registers, read or write
from memory or I/O, and/or the invocation of the ALU.
Like the fetch and indirect cycles, the interrupt cycle is simple and predictable
(Figure 14.8). The current contents of the PC must be saved so that the processor
can resume normal activity after the interrupt. Thus, the contents of the PC are
transferred to the MBR to be written into memory. The special memory location
reserved for this purpose is loaded into the MAR from the control unit. It might,
for example, be a stack pointer. The PC is loaded with the address of the interrupt
routine. As a result, the next instruction cycle will begin by fetching the appropriate
instruction.
CPU
MAR
Memory
Control
unit
MBR
CPU
PC MAR
Memory
Control
Unit
MBR