0 ratings0% found this document useful (0 votes) 146 views195 pagesGodse Microprocessor Microcontroller 3ed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
WRC Rel elmer ait)
echnical Publications Pune”Ss
et
Microprocessors & Microcontrollers
ISBN 978 - 81 -8431-297-3
Al rights reserved with Technical Publications. No part of this book should be
reproduced in any form, Electronic, Mechanical, Photocopy or any information storage and
retrieval system without prior permission in writing, from Technical Publications, Pune.
Published by :
‘Technical Publications Pune®
#1, Amit Residency, 412, Shanivar eth, Pune - 411 030, Ina
Printers :
Vikram Printers
34, Parvati Industrial Estate
Pune-Setara Road,
Pune - 411009.Thanks to professors, students and authors of various technical books for their overwhelming
response to our books. Looking ot the feedoack and the response we received from previous books,
we are very pleased to release a text book on Microprocessors & Microcontrollers.
The purpose of this book is to fulfil a need for text stating in plain, lucid. and simple everyday
language. This book provides a logical method for explaining and it prepares a background of the
topic with essential illustrations. This text is provided with number of solved design examples which
helps students fo understand the applications of microprocessors and microcontrollers based systems.
The ropid spread of microprocessor/microcontroller in sociely has both simplified and
complicated our lives. The of this text is to introduce concepts related to microprocessor and
with the background of microprocessor discuss details of microcontroller family.
The text bosically covers details of Pentium Microprocessor & 8051 Microcontroller, its
architecture, instruction set and programming, and interfacing of it with keyboard, display and other
devices. It also discusses operating modes of Pentium processor, its I/O organisction and memory
organisation. The text also introduces PIC microcontrollers.‘Acknowledg
We wish to express our profound thanks to all those who helped in making this book o reality.
Much needed moral support and encouragement is provided on numerous occasions by our whole
family
We are specially grateful to the great teacher Prof. A.V. Bakshi for his time to time, much
needed, valuable guidance. Without the full support and cheerful encouragement of Mr. Uday
Bakshi the book would not have been completed in time.
Finally, we wish to thank Mr. Avinash Wani, Mr. Ravindra Wani and the entire team of
Technical Publications who have token immense pain to get the quality printing in time.
‘Any suggestions for the improvement of the book will be acknowledged and appreciated.
Atul Godse
Deepali Godse
Dedteated to My “Parents1.3 Pentium Architecture and Functional Description ..
1.4 Pin Description
1.5.4 Real Mode Programming Model . .
1.5.2 Memory Addressing in Real Mode.
1.5.3 Handling Interrupts and Exceptions in Real Mode
1.7 Pentium Super-scalar Architecture ..
1.8 Pipelining ....
1.9 Instruction Pairing Rules...
1.11.1 Cache Memory
1.11.2 Two Level Cache System .
11.1.3 Pentium Cache Organisation.
1.12 Floating Point Uni
2.2 RESET Operation...
2.3 Bus Operations and Bus Cycles
2.4 Bus Cycle States...2.5 Non-Pipelined Bus Cycles ..
2.5.1 Non-pipalined Read Cycle...
2.5.2 Non-pipelined Write Cycle............. ee
2.6 Pipelined Read/Write Cycle
2.7 Burst Cycle...
2.8 Memory Organisation...
2.9 VO Organisation...
2.9.1 V0 Mapped /O......
2.9.2 Memory Mapped V/O .
2.10 Data Transfer Mechanism - 8-bit, 16-bit, 32-bit and 64-bit........
view tions ...... . ec -
3.2 Programmer's Model.
3.2.1 General Purpose Registers
3.2.2 Segment Registers
3.2.3 Index, Pointers, and Base Registers .
3.2.4 EFLAGs Register. .
3.2.4.1 Status Flags.
3.2.4.2 Control Flags .
3.2.4.3 System Flags .
3.2.5 More about EFLAGs .
3.2.6 System Address Registers ..............
3.2.7 System Registers
2.2.7.4 Control Registers
3.2.7.2 Debugs Registers. .
3.2.7.3 Test Registers . .
3.3 Pentium Addressing Modes
3.4 Pentium Data Types .
3.5 Instruction Set Summai
3.5.1 Data Transfer Instructions ..
3.5.2 Binary Arithmetic Instructions .| 3.5.3 Decimal Arithmetic Instructions.
3.5.4 Logical Instructions ........+-+e+s:eseeeseessseeeesessoeresereeeresstess
3.5.6 Bit and Byte Instructions
3.5.8 String Instructions
Instruct
3:6.10 ENTER and companion LEAVE instructions,
3.6.11 Flag Control (EFLAG) Instructions .......ssssseeestevsseees
3.6.12 Segment Register Instructions
4.2 Protected Mode-Support Registers...
4.3 Logical to Physical Address Translation .....4.4 Segmentation ..
4.5 Segment Descriptors and Memory Management
through Segmentation.
45.1 Types of Segment Descriptors ...
4.5.1.1 Non-sysiem Segment Descriptor
4.5.1.2 System Segment Descriptors .
4.5.2 Descriptor Tables
45.3 More about Segment Registers
4.6 Paging...
4.6.1 Support Registers and Tables.................
46.2 PDE Descriptor . .
46.3 PTE Descriptor
4.7 Translation Lookaside Buffer or Page Translation Cache
4.8 Paging Operation...
4.9 Protection
4.9.1 Protection By Segmentation ..........
49.2 Privilege Level Protection .......
4.92.1 Restricting Access to Data. . ;
4.9.2.2 Accessing Data in Code Segments... .
4.9.2.3 Restricting Control Transfers.
4.9.3 Inter-privilege Level Transfer of Control
4.93.1 Conforming Code Segment ._.
4.93.2 CallGates
49.4 Changing Stacks . .
4.9.5 Page Level Protection
4.95.1 Restricting Addressable Domain... 2...
4.95.2 Type Checking. .
4.10 Privileged Instructions...
4.10.1 Privileged Instructions
4.11 Special Protection Mode Instructions...
4.12 Demand Paging ...
4.13 Moving to Protected Mode ..4.14 Switching Back to Real Address Mode
4.15 Virtual Memory.....
5.2 Scheduling Methods for Multi-user Operating System...
5.2.1 Time-Slice Scheduling,
5.2.2 Pre-emplive - Priority Based Scheduling.
5.2.3 Context Switching
5.3 Support Registers and Related Descriptors for Multtasking
5.3.1 Task State Segment (TSS).
G2 iT Sh LD SMT DMM eterna te tet tate ate tite teeta te lett testes
5.3.3 Task Register (TR)
5.3.4 Task Gates and Task Gate Descriptor
5.4 Task Switchin:
5.4.1 Task Switching Without Task Gate
5.4.2 Task Switching with Task Gate .
5.5.1 /O Privilege Level. .
5.5.20 Permission Bit Map
6.2 Entering and Leaving 8086 Virtual Mode...
6.2.1 Entering 8086 Virtual Mode. .
6.2.2 Leaving 8086 Virtual Mode . .
6.3 Registers and Instructions ..... oe
6.3.1 Registers... ss. eee eevee eee eee eee eee tenet etter eee 6-38.4 Introduction g-4
8.2 Features Of 8051 nee cc aecce renee = 2
8.3 MCS-51 (8051) Family Architecture...
8.5.1 Pin-out of 8051
8.3.2 Central Processing Unit (CPU)
8.3.3 On-chip Data Memory and Register Bank.
8.3.4 On-chip Program Memory
8.3.5 Input/Output Ports.
8.3.6 Register Set... .
8.3.6.1 Register A (Accumulator)
8.3.6.2 Register B . . a
8.3.6.3 Program Status Word (Flag Register).
8.3.64 Stack end Stack Pointer...
8.3.6.5 Data Pointer (OPTR)
8.3.6.6 Program Counter
8.3.6.7 Special Function Registers see oe we
8.3.7 The 8051 Oscillator and Clock 0 BI
8.4 Memory Organization in 8051..
8.5 Input/Output Pins, Ports and Circuits.
8.6 External Data Memory and Program Memory ..
8.6.1 External Program Memory.
8.6.2 External Data Memory. .
8.6.3 Important Points to Remember in Accassing External Memory .
8.7 Timers/Counters and their Programming
8.7.2 Timer/Counter Control Logic fee
8.7.3 Timer Oand Timer? Bh
8.7.4 Programming.
8.8 Serial Port and their Programming...
8.8.1 Operating Modes for Serial Port
8.8.2 Serial Port Control Register.8.8.3 Generating Baud Rates... .
8.8.4 Programming 8051 for Serial Data Transfer
8.8.5 Programming 8051 for Receiving Serial Data
8.9 Interrupt Structure ...
8.9.1 Priority Level Structure
8.9.2 Extemal interrupts...
8.9.3 Single-Step Operation
8.10 Other Features
8.10.1 Power Saving Options...........................sssseees
8.10.2 dle Mode eee BAD
‘8.10.3 Power Down Mode
8.10.4 Multiprocessor Communication in 8051
Review Questions...
‘Minimum Sys
9.4 Introduction ..
9.2 Minimum System
9.2.1 Supporting Circuits
92.441 Clock Circuits
9.2.1.2 DemuttiplexingP,, - Pas -
92.13 Reset Circuit
9.2.2 Memory interfacing ......... 2.2. e eee eee
9.2.3 Interfacing Example... 22. eee sees e eset esses eeeeeeeeeeeeeeeeeeeeeee
9.3 8051 I/O Expansion using 8256....
9.4 Interfacing Keyboard.
9.4.1 Key Debounce using Hardware.
9.4.2 Key Debouncing using Software .
9.4.3 Simple Keyboard Interface
9.44 Matrix Keyboard Interface...
9.5 Interfacing Display
9.5.1 LED Displays...
95.2 Interfacing LED Displays. .9.6 Interfacing LCD Display
9.7 Interfacing DAC to 8051.
9.7.1.1 Important Electrical Characteristics for IC 1408 oa ae 9-38
9.7.2 Interfacing DAC 1408 / DAC 0808 with 8051
9.8 Interfacing ADC to 8051......
9.8.1 ADC 0804 Family ..
9.8.2 ADC 0808/0809 Farrily
9.8.3 Interfacirig of ADC 0803/0804/0805 with 8051 .
9.9 Stepper Motor Interfacing .....
9.10 Typical MCS-51 Based System.
9.11 Interfacing Examples
10.3 PICs with Key Features ....
90.4 PICTCCKX essssssasesssssnnssrnnesssssssnsssssssnnsssnsssnssssnssssssessrsssserssseee tO = 5
10.4.1 Features of 16C6X Microcontroller........ 10-5,
10.4.1.1CoreFeaures. 10-5
10.4.1.2 Peripheral Features 10-6
10.4.2 Block Diagram 10-7
10.4.3 Pin Diagrat 10-7
10.4.4 Memory Organization A 2... 10-9
10.44.1 ProgramMemory we 10-9
10.4.4.2DataMemory. ee ee ee 10-9
10.5 PIC16F8XxX... 2 10-14
10.5.1 Fe i i =
10.5.1.1 Core Features... .
10.5.1.2 Peripheral Features
10.5.2 Block Diagram
10.5.3 Pin Diagram. .10.5.4 Memory Organization ... . .
10.5.4.1 Program Memory Organization.
105.42 Data Memory Organization... .
10.6 Reset and Clocking in PIC...
10.6.1 Reset
10.6.1.1 Power-on Reset(POR). .. .
10.6.1.2 Brown-out Reset (BOR).
10.6.1.3 Watch Dog Timer (WOT)... ss.
10.6.2 Clocking, .
106.2.1 Clocking Scheme/Instruction Cycle
106.2.2 instruction Flow/Pipelining ..
10.7.4 Registers........... sees eee
10.7.6 Program Memory Paging... Ett teat ttt
10.8 I/O Ports in PICIGC6X and PICTGFS2X ss sssssesstsssssessssessseese lO = 33
10.8.1 PORTA and TRISA Register
10.8.2 PORTB and TRISB Register
10.8.3 PORTC and TRISC Register.
10.8.4 PORTD and TRISD Registers
10.8.5 PORTE and TRISE Registers .
10.8.6 Parallel Slave Port (PSP) .
10.9 Interrupts in PIC16C6X and PIC16F87xX ..
10.9.4 INT Interrupt
10.9.2 TMRO Interrupt .........
10.9.3 Port8 INTCON Change
10.9.4 Context Saving During Interrupts10.10.3 Timer2 Mcduie. . 5 10-43
10.11 Capture/Compare/PWM Modules in PIC16C6X and PIC16F87X.10 - 44
pl) Ge LMT ACD ttt atti tie eta eae states tsee
10.11.2 Compare Mode .. .
10.11.3 PWM Mode (PWM)
PWM Duty Cycle. eee
10.113.3 Satup for PWM Operation... ee
10.12 Data EEPROM and Flash Program Memory in PIC16F87X.
10.12.1 EECON1 and EECON2 Registers.
10.12.2 Reading the EEPROM Data Memory . .
10.12.3 Writing to the EEPROM Data Memory .
10.12.4 Reading the FLASH Program Memory .
10.12.5 Writing to the FLASH Program Memory
10.13 ADC in PICTGFS7X ssssssssssssssssssssssstssssstsnssstassstssssstasssssasssnsssnasee dO = 5B
10.14 Addressing Modes in PIC16C6X and PIC16F87xX.... ex
10.141 Direct Addressing ..........c-ccstecsseeeresenessseen eee seseneeeoere
10.4.2 Indirect Addressing ......s+sssssssessseesse ser seesesvessersreseeseers
10.15 Instruction Set of PIC16CXX and PIC1GFEXX........csssesesceeseseerne IO = 63
10.15.1 Instruction Descriptions. ...........scssseeeceeesee sees esse eseeeee ones 10-66Introduction to Pentium
Microprocessor
1.1 Historical Evolution of Microprocessors
To understand any microprocessor and its family advancements, it is necessary to look
back in time and study the evolution of it. In this chapter we are going to study the
pentium microprocessor. Before going to study details of Pentium microprocessor we will
see historical evolution of pentium microprocessor family with features of 80286, 80386 and
80486.
The Fig. 1.1 shows the complete Intel family of processors with different abilities and
power. From this diagram it can be noticed that the bigger the number, the more powerful
the processor, and adding SX to the end means a cut down version. The first generation in
Intel family is 4004. After producing the 4004, Intel announced the 8008, a larger, faster
version. In 1974 Intel came out with 8080. The 8080 was a considerable improvement over
its predecessors. Then Intel came out with 8085 microprocessor. The 8008, 8080 and 8085
represent a progression of 8-bit processors, with each new device including more circuitry
and being more flexible. The next generation was 8086 processor, a 16-bit processor, with
advanced architecture and instruction set. At the same time Intel introduced processor
8088. The 8088 is an 8-bit version of the 8086 which has fewer data lines but retains all of
the processing features of the 8086. The programs that run on 8088 will also run, without
modification on the 8086. The 8086/88 pair were the first members of iAPX 86, family of
microprocessors. (See Fig. 1.1 on next page.)
In 1983 the next version was announced, the 80186/88 very similar to 8086/8088 pair.
The 80186/88 included many useful peripheral I/O functions as an integral part of the
microprocessor. The improved instruction set of 80186/88 supports these peripheral 1/O
functions. Although the 80186 provided increased functionality, it maintained compatibility
with the 8086, ensuring that it could execute 8086 programs.
After 80186/88, Intel has announced 80286, which is 16-bit processor like 8086. The
80286 was the first family member designed specifically for use as a CPU in a multi-user
microcomputer. It contains many advanced modes of operations not supported by 8086.
The 80286 boosted a new mode of operation-protected mode. Due to this the entire
(14)Microprocessors and Microcontrollers 1-2 Intro. to Pentium Microprocessor
Fig. 1.1 The complete Intel family
concept of memory segmentation was changed. The virtual memory management circuitry
were included in the 80286, which allow an 80286 to operate in either real address mode
or protected virtual address mode.
Features of 80286
1
2
‘The 80286 is a 16-bit processor. The 16-bit ALU allows to process 16-bit data.
It has 24-bit address bus. So it can access upto 16 Mbytes (24) of physical memory
or 1 Gigabyte (2°) of virtual memory.
The 80286 can be operated at three different clock speeds. These are 4° MHz
(80286-4), 6 MHz (80286-6), and 8 MHz (80286).
The 80286 includes special instructions to support operating systems.
The 80286 is housed in a 68-pin leadless flat package. This makes it possible to
provide separate pins for address lines and data lines, which speeds up processing
and simplifies the hardware.
It contains four separate processing units. These are the Bus Unit (BU), the
Instruction Unit (IU), the Address Unit (AU) and the Execution Unit (EU). This
pipelined architecture greatly improves the performance of 80286.
The 80286 microprocessor is compatible with their earlier 8086, 8088, 80186 and
80188 chips. Virtually anything that runs under those microprocessors will also run
under the 80286.
It has virtual memory-management circuitry and protection circuitry, which allows
an 80286 to operate in either real address mode or protected virtual address modeMiéroprocessors and Microcontrollers 4-3 Intro. to Pentium Microprocessor
9. The 80286 was the first family member designed specifically for use as CPU in a
multi-user microcomputer.
In 1986, the next advanced processor, the 80386DX, was introduced. As expected,
80386DX is faster than any of its predecessors, with a minimum operating frequency of
16-MHz. It is an 32-bit processor with 32-bit register set, address bus and data bus.
Chip Introduction | Data bus | Address bus
4004 1971 4 8
2008 1972 3 | 8
8080 1974 a | 16
2085 1977 8 16
8086/88 1978
801867188 1982
80286 1983
803860x 1986
80386Sx 1988
B04860x 1969
(With coprocessor)
80486Sx 1989
(Without
coprocessor)
Pentium 1993
Table 1.1 80X86 family tree
During 1988, an “economy version” of the 80386, called the 80386SX was introduced
by Intel. This processor had the same outside connections as the 80286, but inside it was a
386-processor supporting the 386’s expanded instruction set and various operating modes.
The Table 1.1 shows the 80X86 family tree.
Features of 80386
1. The 80386 is a 32-bit processor. The 32-bit ALU allows to process 32-bit data.
2. It has 32-bit address bus. So it can access upto 4 Gbyte (2* ) physical memory or
64 Tetrabyte (2%) of virtual memory (Explained in the later section).
3. The 80386 runs with speed upto 20 MHz instructions per second.
The pipelined architecture of the 80386, allows simultaneous instruction fetching,
decoding, execution and memory mariagement. Instruction pipelining, a high bus
bandwidth and on-chip address translation significantly shorten the average
instruction execution time of 80386. These architectural design features enable the
80386 to execute 3 to 4 million instructions per second.Microprocessors and Microcontrollers 44 Intro. to Pentium Microprocessor.
5.
It allows programmers to switch between different operating systems such as
PC-DOS and UNIX.
It can operate on 7 different data types :
a. Bit b. Byte «Word — d. Double word
e.Pword f, Quadword —_g. Tenbyte.
It has built-in virtual memory management circuitry and protection circuitry
required to operate an 80386 in these modes.
The 80386 can operate in real mode, protected mode or a variation of protected
mode called virtual 8086 mode. In real mode it functions basically as a fast 8086 or
real mode 80286. The protection mode operation provides paging, virtual
addressing, multilevel protection and multitasking and debugging capabilities.
The 80386 microprocessor is compatible with their earlier 8086, 8088, 80136, 80188
and 80286 chips. Virtually anything that runs under these microprocessors will also
run under the 80386.
Early in 1989, Intel introduced the 80486DX, the more highly integrated microprocessor
with built-in coprocessor. Meanwhile, Intel has also developed step-down version 80486SX
(without coprocessor and lower clock speed).
Features of 80486
FeV
It is a highly integrated device containing about 1.2 million transistors.
The 80486 operates on 25 MHz, 33 MHz, 50 MHz, 66 MHz or 100 MHz.
It has built-in math coprocessor.
80486 is a 32-bit architecture with on-chip memory management and cache
memory units.
On-chip cache memory allows frequently used data and code to be stored on-chip,
thereby reducing accesses to the external bus.
MMU consists of segmentation unit and paging unit. Segmentation ailows
management of the logical address space by providing easy data and code
relocatibility and efficient sharing of global resources. Paging allows operating
system designers to make physical memory appear to be anywhere in the
4Gigabytes address space. In protected virtual mode it can manage upto
64 Terabytes of virtual memory.
The MMU provides four levels of protection for isolating and protecting
applications and the operating system from each other.
The 80486 has three modes of operation : Read mode, Protected mode and Virtua!
8086 mode.Microprocessors and Microcontrollers 15 Intro. to Pentium Microprocessor
9. It is available in two versions : 80486 DX and 80486SX. The only difference
between these two versions is that the 80486SX does not contain the numeric
coprocessor.
10. Most of the 80486 instructions require only one clock instead of two clocks
required by the 80386.
11. It supports five-stage instruction pipeline scheme that allows it to execute
instructions much faster than 80386.
12. It executes conditional JUMP instructions more efficiently. When the 80486 decodes
a conditional jump instruction, it autornatically prefetches one or more instructions
from the jump destination address just in case the jump is taken. Therefore, if the
branch is taken, the 80486 does not have to wait through a bus cycle for the first
instruction at the branch address.
13. It has built-in parity check/generator unit to implement parity detection and
generation for memory reads and writes.
14, It supports burst mode memory reads and writes to implement fast cache fills.
15. It executes a few new instructions that control the internal cache memory and
allow addition (XADD) and comparison (CMPXCHG) with an exchange and a byte
swap (BSWAP) operation. Other than these few additional instructions, the 80486 is
100 percent compatible with the 80386 and 80387.
16. It supports built-in-self-test. It tests microprocessor, coprocessor, and cache at reset
time. If the 80486 passes the test, EAX contains a zero.
17. It has additional test registers (TR3 - TRS) to test the cache memory.
The Pentium, introduced in 1993, was similar to the 80386 and 80486 microprocessors.
It contained larger internal cache and data bus width is extended to 64-bit. The table 1.2
shows the comparison between various pentium processors.
Processor Data bus | Memory | Lt cache | L2 cache | Bus
width size | Data-Code transfer
speed
Pentium 60 MHz 1993 6 4cpyte | eK - 8K - — |60- 66 mez
66 MHz
120 MHz
133 MHz
233 MHz
Pentium Pro 1995 6 64 GByte | 8K - 8k 256K | 60 - 66 MHz
150-166 MHzMicroprocessors and Microcontrollers
Pentium 1! 350 MHz
400 MHZ
450 MHz
1-6
Intro. to Pentium Microprocessor
16K - 46K | 512K 100 MHz
Pentium I Xeon
64 GByte
100 MHz.
16K - 16 |512 K or 1M
Pentium Ill 1 GHz 64 GByte | 16k -16K | 512K 100 MHz
Slot 1 version
Pentium tll 1 GHz 1998 64 64 Geyte | 1ek- 16K | 256K 100 MHz
Flip chip version
Pontium Ill 1 GHz, 1998 64 64 Gayte | 16k- 16k | 256K 66 MHz
Celeron
Pentium IV 1.3 GHz 2000 64 64 Gayte | 16k - 16k | 256K 100 MHz
1.4 GHz
1.5 GHz
Table 1.2 Comparison between pentium processors
= Pentium IV uses the RAMBUS memory technology in place of SDRAM technology
used in other pentium processors.
1.2 Pentium Features
microprocessor and provides
The pentium processor family architecture contains all of the features of the 80486
‘ignificant additions and enhancements as given below :
= Wider Data Bus Width : The Pentium processors have a wider data bus width.
The data bus width has been increased from 32-bit to 64 bit to improve the data
transfer rate. Burst read and burst write back cycles are supported by the Pentium
processors. In addition to 64-bit bus, bus cycle pipelining has been added to allow
two bus cycles to be in progress simultaneously.
= Faster Floating Point Unit : The floating-point unit has been completely redesigned
over the 80486 CPU. Faster algorithms provide up to ten times speed-up for
common operations including add, multiply, and load.
= Improved Cache Structure : Pentium processors include separate code and data
caches integrated on-chip to meet performance goals. Each cache is 8 Kbytes in
size, with a 32-byte line size and is 2-way set associative. Each cache has a
dedicated Translation Lookaside Buffer (ILB) to translate linear addresses to
physical addresses. The data cache is configurable to be write back or write
through on a line-by-line basis and follows the MESI protocol. The data cache tags
are triple ported to support two data transfers and an inquire cycle in the same
clock. The code cache is an inherently write-protected cache. The code cache tags
are also triple ported to support snooping and split line accesses. Individual pagesMicroprocessors and Microcontrollers 1-7 Intro. to Pentium Microprocessor
can be configured as cacheable or non-cacheable by software or hardware. The
caches can be enabled or disabled by software or hardware.
«Dual Integer Processor : Pentium processor has a dual integer processor. It allows
execution of two instructions per clock.
«= Branch Prediction Logic : The Pentium uses technique called branch prediction to
check whether a branch will be valid or invalid. To implement branch prediction
Pentium processor has two prefetch buffers, one to prefetch code in a linear
fashion, and one that prefetches code according to the Branch Target Buffer (BTB).
Therefore, the needed code is almost always prefetched before it is required for
execution.
= Data Integrity and Error Detection : The Pentium processors have added
significant data integrity and error detection capability. Data parity checking is still
supported on a byte-by-byte basis. Address parity checking, and internal parity
checking features have been added along with a new exception, the machine check
exception.
= Functional Redundancy Checking : The Pentium processors have implemented
functional redundancy checking to provide maximum error detection of the
processor and the interface to the processor. When functional redundancy checking,
is used, a second processor, the "checker" is used to execute in lock step with the
"master processor. The checker samples the master's outputs and compares those
values with the values it computes internally, and asserts an error signal if a
mismatch occurs.
= Enhancement Virtual 8086 Mode : Enhancements to the virtual 8086 mode have
been made to increase performance by reducing the number of times it is necessary
to trap to a virtual 8086 monitor.
= Superscalar Processor : Processors capable of parallel instruction execution of
multiple instructions are known as superscalar processors. The Pentium is capable,
under special circumstances, of executing two integer or two floating point
instructions simultaneously and thus it supports superscaler architecture.
The Pentium Pro is a still faster version of the Pentium, and it contains a modified
internal architecture that can schedule up to five instructions for execution, and an even
faster floating point unit. It also contains a 256 K-byte or 512 K-byte level two cache in
addition to the 16 K-byte (8 K for data and 8 K for instruction) level one cache. The
Pentium Pro includes error correction circuitry (ECC) to correct a one bit error and
indicate a two bit error. It provides four additional address lines which makes it possible
to access 64 Gbytes of directly addressably memory space.-Microprocessors and Microcontrollers 1-8 Intro. to tium Microprocessor
1.3 Pentium Architecture and Functional Description
Fig. 1.2 shows internal architecture of Pentium processor. As shown in the Fig, 1.2, it is.
a complex processor with many interlocking parts. At the heart of the processor there are
. two pipelines, the U pipeline and the V pipeline, The U-pipeline can execute all integers
and floating point instructions. The’V pipeline can execute simple integer instructions and
the FXCH floating-point instructions. Further more, during execution, the U and V
pipelines are capable of executing two integer instructions at the same time, under special
conditions.
32-bt
address bus
Instruction decode
Fig. 4.2 Pentium architecture block diagramMicroprocessors and Microcontrollers 1-9 Intro. to Pentium Microprocessor
Bus Unit
The Pentium communicates with the outside world via a 32-bit address bus and a
64-bit data bus. The bus unit is capable of performing burst reads and writes of 32 bytes
to memory, and through bus cycle pipelining it allows two bus cycles to be in progress
simultaneously. It consists of following functional entities :
Address Drivers and Receivers : During bus cycles the address drivers push the
address onto the processor's local address bus (Ay, : A; and BE, : BE,). The address
bus transfers addresses back to the Pentium address receivers during cache snoop
cycles. Only address lines Ax, : As are input during cache snoop cycles.
Write Buffers : The Pentium processor provides two write buffers, one for each of
the two intemal execution pipelines. This architecture improves performance when
back-to-back writes occur.
Data Bus Transceivers : The transceivers send data onto the Pentium processors's
local data bus during write bus cycles, and receive data into the processor during
read bus cycles.
Bus Control Logic : The Bus Control Logic controls whether a standard or burst
bus cycle is to be run. Standard bus cycles are run to access 1/O locations and
non-cacheable memory locations, as well as cacheable memory write operations.
During these bus cycles the transfer size will be either 8, 16 or 32 bits as specified
by the instruction. Burst cycles are run by the Pentium processor during cache line
fills and during cache write-back bus cycles from the data cache, Four quad-words
are transferred during each burst bus cycle.
mie
Bus Master Control
Level 2 Cache Control
Internal Cache Control
Parity Generation And
Control
Fig, 1.3 The elements comprising the Pentium processor bus unit
Bus Master Control : Bus Master control signals allow the processor to request the
use of the buses from the arbiter and to be preempted by other bus masters in the
system.Microprocessors and Microcontrollers 1-10 Intro. to Pentium Microprocessor
= Level Two (L2) Cache Control : The Pentium processor includes the ability to
control a L2 (secondary) external cache operation.
= Internal Cache Control : Internal Cache Control logic monitors input signals to
determine when to snoop the address bus and output signals to notify external
logic, the results of a snoop operation. It also ensures proper cache coherency.
= Parity Generation and Control : It generates even data parity for each of the eight
data paths during write bus cycles and checks parity on read bus cycles. It also
generates a parity bit for the address during write bus cycles and checks address
parity during external cache snoop operations.
Code Cache
An 8 KB instruction cache is used to provide quick access to frequently used
instructions. It holds copies of the most frequently used instructions, and it is dedicated to
supplying instructions to each of the processor's execution pipelines. The cache is
organized as a two-way set associative cache with a line size of 32 bytes. The cache
directory is triple ported to allow two simultaneous accesses from the prefetcher and to
support snooping. When an instruction is not found in the code (instruction ) cache, it is
read from the external memory and a copy is placed into the code cache for future
references.
Prefetcher
Prefetcher requests for Instructions from the code cache. If the requested instruction is
not in the cache, a burst bus cycle is run to external memory to perform a cache line fill.
Prefetch Buffers
Pentium provides four prefetch buffers. They work as two independent pairs. When
instructions are prefetched from the cache, they are placed into one set of prefetch buffers,
while the other pair remains idle. When a branch operation is predicted in the Branch
Target Buffer (BTB), it requests the predicted branch's target addresses from cache, which
are placed in the second pair of buffers that was previously idle. To do this processor gets
the new instruction from branch address in no time.
Instruction Decode Unit
Pentium provides two stage decoding. The instructions are decoded in two stages
known as Decode 1 (D1) and Decode 2 (D2). During D1, the opcode is decoded in both
pipelines to determine whether the two instructions can be paired according to the
Pentium processor's pairing rules. If pairing is possible, the two instructions are sent
simultaneously to the stage 2 decode. During D2 the address of memory resident operands
are calculated.Microprocessors and Microcontrollers 4-14 Intro. to Pentium Microprocessor
Control Unit
It is also. referred to as the Microcode Unit. This control unit consists of the
following sub-units :
= Microcode Sequencer
= Microcode Control ROM
This unit interprets the instruction word and microcode entry points fed to it by the
Instruction Decode Unit. It handles exceptions, breakpoints and interrupts. In addition, it
controls the integer pipelines and floating-point sequences.
Arithmetic/Logi¢ Units (ALUs)
Pentium provides two ALUs to perform the arithmetic and logical operations specified
by the instructions in their respective pipeline. The ALU for the "U" pipeline can complete
and operation prior to the ALU in the "V" pipeline, but the opposite is not true.
Address Generators
Pentium provides two Address Generators (one for each pipeline). They generates the
address specified by the instructions in their respective pipeline.
Data Cache
A separate internal Data Cache holds copies of the most frequently used data
requested by the two integer pipelines and the Floating Point Unit. The internal data cache
is an 8KB write-back cache, organized as two-way set associative with 32-byte lines. The
Data Cache directory is triple ported to allow simultaneous access from each of the
pipelines and to support snooping.
Paging Unit
It is enabled by setting the PG bit in CRy. It translates the linear address (from the
address generator) to a physical address. It can handle two linear addresses at the same
time to support both pipelines.
Floating-Point Unit
The floating point unit performs floating point operations. It can accept up to two
floating, point operations per clock when one of the instruction is an exchange instruction.
1.4 Pin Description
The Fig. 1.4 shows the pin diagram of pentium processor and the Fig. 1.5 shows the
pin diagram of pentium processor with functional grouping.Microprocessors and Microcontrollers 1-12 Intro. to Pentium Microprocessor
v2 s 4 5 6 7 8 ew es ws
fav WIBEWEE Vex Veo Vex Vex Veo BPs Om Ver Veo Veo Veo Veo Veo Veq Veq OP, Dy» Oe
sjooo0oo0oo00o00000 09090000 0 a
VB BF Oy Yes Yas Ves Ves Day Oe Ves Ves Yes Yes Yes Yes Yes Yes Du Dy Ow
300000000 ooo000000000 o
Yigg TERRPM)BP; Oy DP; Dig Dzo Das Dag Dy Dip Dy Dip Dz Diy Dyy Oxy Dig Oy Dy Dez
lS oodod SS SSC SGSSSSSESSSS |
VecPMYBFs Dy Ds; Dys Dig Day DPs Dy Dry Dig Oxy Dry Dey Dap Oy Oxy OFy Dy Due Ore
joo 00080800988 OO OOOO OOO OO {|p
Nec Ves OO Di Dy OF Day OP,
eF00000 ooo0o0o |r
cc Vgs Os Oy Dey Dye Osx Yoo
FOO0 0 0000 |r
Mec Yes D5 Op Dp Ox Yeo Yeo
6 ie) 06 66 |o
Voc Vag FERR OP, Des Ope Ves Ose
"6 60 0 606 |x
eg VU RERTOE Das Dee Yes Yeo
J ooo 0000 |i
gg Veg BFF CuK Dg: Vas Vee
K 000 0000 |k
gs MOLDEREE BT wet Oxy Yas Voc
0 0 0 0 a oo Ooh
gs Wee TT Topview PERFRONE Yes Yoo
"Jo0 0 0 0000 |u
ce Yop WA OSE nk NM Ves Yee
"ld 6 0 0 0006 |x
ec Veg 82 7B SHI THS Vee Veo
"6 C00 0006 |r
vec Vgg HOA BE, Veo NO Vas Vee
a5 d'0 0 606 6 fc
Yee Yes PORKSCXC RS NC Yes Veo
810 6 0 0 0006 |r
co Yes Pa BE JST Wc KORE 105
s|6 600 0000 |s
Veg Ygs SIDR TOKUIETRE, 8 Oi Mw Ap Ay Ae Ap Ay QW N & NG BT WaT TO,
JO G0000006CCCSSG5560 or
Yoo FURR Prov BE, Kah BE, BEs Axe Azz A Au Aw Au Az Ao Ae As As Ags Aus Aer
yooootF00d6G6G0G 360 3 fe
TE, ereOCORR OF HOLD hoy Vas Ves Yes Ves Ves Vas Yes Yes Yes Ves Yes Ves Ay Mp Ae
yoo 0006 SSSSSSSSSCSOSSS
RE, PAPERCD Ae Voc Voc Voc Voc Vac Vee Voc Yee Mec Yee Yeo Ven Vg BO
wl OO OFGCCSSCSCOSSOSSCSGDOOw
Fig. 1.4 Pin diagram of Pentium processorMicroprocessors and Microcontrollers. 3 Intro. to Pentium Microprocessor
Clock clk
Initialization
Dual
Brent processing
trace
Probe
mode
Power
management Address
bus
Breakpoint/
Performance
mornin ‘Address mask
(Bus frequency)
(Data bis)
Address
parity
Tap Data
port parity
Pentium
Processor (internal Parity Error)
Functional (System Error)
Redundancy checking
System management
mode Bus
cycle
definition
Programmable
interupt
control Bus
control
Interrupts Page
cacheability
Cache
Control
Bus
arbitration
Cache
‘snooping!
consistency
(Write ordenng) EWBE
(Cache flush) FLUSH
Fig. 1.5Microprocessors and Microcontrollers 4-14 Intro. to Pentium Microprocessor
Pentium Hardware Signals
Common Signals
Changed Functionality
A20M
Address Mask : When asserted, forces pentium to limit addressable
memory to 1 MB to emulate the memory space of the 8086. This signal
is active only in the real mode.
AaiAs
+
These 29 address lines, together with the byte enable outputs, form the
Pentiumm's 32-bit address bus. With this 32-bit address @ memory space
of 4 gigabytes can be accessed
ADS
Address Strobe : When low, indicates the begining of a new bus cycle.
AHOLD
Address hold : This signal is used to place the Pentium’s address bus
into a high impedance state so that an inquire cycle can be run.
vo
Address parity is driven by the Pentium processor with even parity
information. It is generated in the same cock that the address is driven.
Even parity must be driven back to the Pentium processor during inquire
cycle on this pin in the same clock.
The address parity check status pin is assorted if the Pentium
processor has detected a parity error on the address bus during inquire
oycles.
| Advanced Programmable Interrupt Controller (APIC) Enable - This
gnal is used 19 enatle or sable the Pentium’ interval APC interup
| centro cuit
‘The byte enable pins are used to deiermine which bytes must be
‘written to external memory, or which bytes were requested by the CPU
for the current cycle, The byte enables are driven in the same clock as
the address lines (Agy-Ag).
‘See the purpose of each byte enable
Output Data Bus Enabled
BE Do-Dy
BEL Ds-Dis
BE De -Ds
BES Day - Ds
BE, Dx - Dae
BES Dwo- Da
BES Das - Dss
BE, Ds Des
These inpuls are sampled during reset and they contol the ratio of bus
‘frequency to CPU core frequency.
Bh=1 —_dusioore ratio = 2/3
BR busicore ratio = 1/2
Off This input causes the processor to terminate any bus cycle
currently in process and tri-state its buses. Execution of the interrupted
bus cydle is restarted when BOFF goes high.Microprocessors and Microcontrollers 1-15 Intro. to Pentium Microprocessor
BP [3 : 2]
PM/BP [1:0]
-
The breakpoint pins (BPs-Pp) comespond to the debug registers,
DR,-DRo. These pins externally indicate a breakpoint match when the
debug registers are programmed to test for breakpcint matches.
BP, and BP are multiplexed with the performance monitoring
(PM, and PM). The PB, and PB bils in the Debug Mode Control
Register determine if the pins are confgured as breakpoint or
performance monitoring pins.
BRDY
Burst Ready : In Pentium BRDY signal is used to indicate that the
external device is ready to transfer data,
BREQ
Bus Request : This signal when active indicates that the pentium has
generated 2 bus request.
BT; ~ BT
The branch trace outputs provide bits 2-0 of the branch target linear
address and the default operand size on BT. These output become|
valid during a branch trace special message cycle
BUSCHK
The bus check input allows the system to signal an unsuccesstul
Completion of a bus cycle. If this pin is sampled active, the Pentium
processor will latch the address and control signals in the machine check
registers. If, in addtion, the MCE bit in CR, is set, the Pentium
processor will vector ‘o the machine check exception
ae
The output indicates whether the data associated with the current bus
cycle is being read from or written to the data cache.
This is the clock signal for the Pentium. It decides the operating
frequency of the Pentium, For example, to operate the Pentum at
66 MHz, we apply a 66 MHz clock to thio pin,
a ke
DatalCode : It indicates that the current bus cycle is accessing code
{0/6 = 0) o data (D/C = 1).
oe
Des -Do vo These are the 64 data lines for the processor, Lines D7-D) define the
least significant byte of the data bus ; lines OgyOsg define the most
significant byte of the data bus.
DP; - DPp vo These are the data parity pins for the processor. There is one for each
byte of the data bus. They are driven. by the Pentium processor with
even parity information on writes in the same clock as write data, Even
Parity information must be driven back to the Pentium processor on
these pins in the same clock as the data fo ensure that the correct
parity check status is indicated by the Pentium processer DP, applies to
Dysr5gOPy apes fo Dy-Do._
i ina bc se al
porey ae ete seesMicroprocessors and Microcontrollers 1416 Intro. to Pentium Microprocessor
EADS ° External Address Strobe : It is used to indicate that an extemal
address may be read by the address bus during an inquire cycle.
EWEE 1 ‘empty input. when inactive (high), indicates
‘cycle is pending in the external system.
FERR ° Floating Point Error : This output goes iow when floating point unit of]
pentium processor generates an error.
FLUSH ' When asserted, the cache flush input forces the Pentium processor to
write back all modified lines in the data cache end code cache.
FREMC 1 ‘The functional redundancy checking masterichecker mode input Is
used to determine whether the Pentium processor is configured in
master mode or checker mode. When configured as @ master, the
Pentium processor drives its output pins as required by the’ bus
pprotocol. When configured_as @ checker, the Pentium processor
{ristates all outputs (except ERR) and samples the output pins.
‘The configuration as a master/checker is set after RESET and may not|
be changed other than by a subsequent RESET.
ait ° The hit indication is driven to reflect the outcome of an inquire cycle. If
an inquire cycie hits a valid line in either the Pentium processor data or
instruction cache, this pin is asserted two clocks after EADS is sampled
asserted. If the inquire cycle misses the Pentium processor cache, this
pin is negated two clocks after EADS.
° The hit to a modified line output is driven to reflect the outcome of
fan inquire cycle. It is assorted aftor inquiro cycles which rosulted in a
hit to a modified line in the data cache. It-is used to inhibit another bus
master from accessing the data until the line is completely written back.
° Hold Acknowledge : This ouput goes high in response to HOLD|
‘request to indicate that the pentium has been placed in the hold state.
HOLD 1 When high, the pentium tri-states its bus signals and activates HLOA.
1BT ° Instruction branch taken indicates that the Pentium has taken an
instruction branch.
TERR ° ‘The internal error pin is used 10 indicate two types of errors, internal
arity errors and functonal redundancy errors. if a party error occurs
on a read from an internal array, the Pentium processor will assert the
TERR pin for one clock and then shutdown. If the Pentium processor is
configured as a checker and a mismatch occurs between the value
sampled on the pins end the corresponding value computed internally,
the Pentium processor will assert IERR two clocks after the mismatched
value Is retumed.
IGNNE 1 Ignore Numeric Exception : A iow on this input allows the processor to
continue executing floating-point instructions, even if an error is
generated.
INIT 1 ‘The Pentium processor initialization input pin forces the Pentium
processor to begin execution in a known stale. The processor state
after INIT is the same as the state after RESET except that the internal
caches, write buffers,and floating point registers retain the values they
had prior to INIT. if INIT is sampled high when RESET transitons ‘rom
high to low, the Pentium processor wil perform builtin self test prior to
the start of program execution.Microprocessors and Microcontrollers 4-17 Intro. to Pet
im Microprocessor
INV ‘The invalidation inpu! determines the final cache line state (shared or
invalidated) in case of an inquire cycle hit
uw This output goos high for one clock cycle wach time an inetruction
completes in U ppeline.
Vv This output goes high for one clock cycle each time an instruction
completes in V pipeline.
KEN Cache Enable : This signal is used to determine whether current cycle
is cacheable or rot
tock Bus Lock : This signal goes low to indicate that the current bus cycle is
locked and may not be interrupted by any other bus master
jo (Memory/input-Output) : This signal indicates the type of current bus
cycle
MiO=0 ~ VO cyclo
MiO= 1 — memory cycle
NA ‘An active next address input indicates that the external memory system
is ready to accept @ new bus cycle although all data transiers for the
current cycle have not yet competed,
This is a non-maskable interrupt signal of pentium
| Private Bus Request
Private Bus Grant : This signa is used in a dual processing sysiem to
indicate when private bus arbitration % allowed.
This signal is used to neo a orivate bus
‘operation in a dual-processing system.
The page cache disable pin reflects the state of the PCD bit in CRs,
the Page Directory Entry, or the Pago Table Entry. Tho purpose of PCD
is to provide an external cacheabilly indication on a page by page basis,
Data Parity Check : This output goes low, if the Pentium detects
parity error on the deta bus. But in Peniium parity checking has been
extended ; if PEN is also asserted fow during the same cycle, the
Pentium will savo a copy ‘of the address and control signals in an
intemal machine check reaister. Additionally, if the MCE bit in the new
CRA register is sot, a machine check exception is generated
Parity Enable : If this input is low curing the same cycle a parity error|
detected, the Pentium will save a copy of address and control signals in
an intemal machine check register.
Private Hit: Iti used to mainiain the local cache
_|_ processor sytem. _
‘Plivate Modified Hit + t S used in conjunction with PHIT to maintain
hee eeu: ig a dual-processor system.
_ {Programmable interrupt Controller Clock) « This ee
_ Serial data rate in the internal API
The
or ee eee
The probe ready output pin indicates that the provessor has stopped
normal execution in response to the R/S pin going active, or probe
Mode being entered. This output is used for debugging purpose.Microprocessors and Microcontrollers, 1:18 Intro. to Pentium Microprocessor
PWT
°
‘The page write through pin reflects the state of the PWT bit in CR,
the page directory entry, or the page table entry. The PWT pin is used
to provide an external write back indication on a page-by-page basis.
R/S
‘The runistop input is an asynchronous, edge-sensitive interrupt used to
stop the normal execution of the processor and piace it into an idle
state.
RESET
This. signal forces pentium to initialize its registers. to known slate,
invalidate code and data cache, and fetch its first instruction from
address FFFFFFFOH. This signal must be active for at least 1_ms after
power on,
scyc
The split cycle output is asserted during misaligned LOCKed transfers
to indicate that more than two cycles will be locked together. This
signal is defined for locked cycles only.
The system management interrupt causes a system management
interrupt request to be latched intemally. When the latched SM is
recognized on an instruction boundary, the processor enters System
Menagement Modo.
‘An active system management interrupt active output indicates that
the processor is operating in System Management Mode
Stop Clock
[internal clock,
‘When low, this signal causes ine penile
The testability clock input provides the clocking function for the
Pentium processor boundary scan in accordance wth the IEEE
Boundary Scan interface (Standard 1149. 1).
‘The test data Input is a serial input for the test logic. TAP instuctions
and data are shifted into the Pentium processor on the TD pin on the
rising edge of TCK when the TAP controller is in an appropriate state.
Test Data Output : This signal is used to send serial test information on
the falling edge of TCK.
‘The value of the test mode select input signal sampled at the rising
edge of TCK controls the sequence of TAP controller state changes.
When asserted, the test reset input allows the TAP controller to be
asynchronously intialized.
Write/Read : This signal indicates whether the current bus cycle is read
cycle or write cycle,
WIR=0 — Read cycle
WIR= 1 — Write cycle
The write back/write through input allows a data cache line to be
defined as writa back or write through on a line-by-line basis,
Note :
Table 1.5 ”
Non-shaded signals are of Pentium processor (510\60, 567\66) and shaded
signals are the additional signal provided in Pentium processor (610\75, 735\90,
815\100, 1000\120, 1110\133).Microprocessors and Microcontrollers 1:19 Intro. to Pentium Microprocessor
Pin Grouping According to Function
Table 1.6 organizes the pins with respect to their function.
Function Pins
Clock LK
Initialization RESET, INIT
Address Bus Agy-Ay. BEy-BEy
Address Mask A20M
Data Bus D532p
Address Parity AP, APCHK.
Data Party Dr OP, POH PEN
Internal Parity Error TERR
System Error BUSCHK
Bus Cycle Definition MO, DIG WIR CACHE, SCYC, LOCK
Bus Control ‘ADS, BRDY, NA
Page Cacheabiity PCD, PAT.
Cache Control KEN, We/WT
Cache Snooping/Consistency ‘AHOLD, EADS, HIT,HITM, INV
Cache Flush FLUSH
Write Ordering EWBE
Bus Arbitration BOFF, BREQ, HOLD, HLDA
Interrupts —* INTR, NMI
Floating Point Error Reporting FERR, IGNNE
System Management Mode ‘SMI, SNIACT
Functional Redundancy Checking FRCMC (IERR)
TAP Port ‘TCK, TMS, TD}, TDg, TRST
BreakpointPerformance Monitoring PM/BPo, PMBP,, BP» BP,
Power Management STPCLK
RIS, PROY
BT,-BTy, IBT
CPUTYP, D/P, DPEN, PBGNT, PEREQ, PHIT, PHITM
Probe Mode
Branch Trace
Dual Processing
Programmable Interrupt Control PICCLK, PICDp, PICD,, APICEN
BF) - BF;
Bus Frequency
Table 1.6 Pin functional groupingMicroprocessors and Microcontrollers 4-20 Intro. to Pentium Microprocessor
1.5 Pentium Real Mode
The Pentium microprocessor can operate basically in either Real Mode, or Protected
Mode. When Pentium is reset or powered up it is initialized in Real Mode. The Pentium
maintains the compatibility of the object code with 8086, 80286, 80386, and 80486 running
in real mode. In this mode, the Pentium supports same architecture as the 8086, but it can
access the 32-bit register set of Pentium. In real mode, it is also possible to use addressing,
modes with the 32-bit override instruction prefixes. In this section, we will see operation of
Pentium in real mode.
1.5.1 Real Mode Programming Model
The programming model makes it easier to understand the microprocessor in a
programming environment, The real mode programming model gives the programming
environment for Pentium in real mode. It shows only those parts of the microprocessor
which the programmer can use such as various registers within the microprocessor. Fig. 1.6
shows the real mode Programming Model for Pentium microprocessor.
In the diagram, only. the shaded portion is a part of real mode. It consists of eight
16-bit registers (IP, CS, DS, SS, ES, FS, GS and Flag register) and eight 32-bit registers
(BAX, EBX, ECX, EDX, ESP, EBP, FSI, EDI). In real mode, Pentium can access CRO, which
is used to enter into the protected mode. The Protection Enable bit (PE) is used to switch
the Pentium from real to protected mode.
From this description it can be seen that Pentium in real mode is a 8086 with extended
registers and two additional data segment registers such as FS and GS. It also implements
separate memory and I/O address space. Memory space is 1,048,576 bytes (1M byte) and
the I/O address space is 65,536 bytes (64 Kbytes), which is similar to 8086 memory and
1/0 address space.
1.5.2 Memory Addressing in Real Mode
As mentioned earlier, in Real Mode, memory size is limited to 1 Mbyte. Due to this,
only AyAy, address lines are active. The higher address lines AygAs, are normally high.
But in case of intersegment jump or call, during CS-relative memory, these address lines
(AzrAg)) are low.
Eventhough IMbyte memory address space is available in real mode, all this memory
cannot be active at one time. Actually, the 1M bytes of memory is partitioned into 64K
(65536) byte segments. A segment represents an independently addressable unit of memory
consisting of 64K consecutive byte-wide storage locations. Each segment has its own
starting address i.e the lowest-addressed byte storage location. The segment registers hold
the starting addresses of the active segments in the entire memory. In Pentium, only six
out of 16 (IMbyte / 64Kbyte) 64 Kbyte segments can be active at a time. (Code Segment,
Stack Segment, Data Segment, ES, FS and GS). Fig 1.7 shows the active memory segments.Microprocessors and Microcontrollers 41-21 Intro. to Pentium Microprocessor
0000046
EXTERNAL MEMORY
CODE SEGMENT (CS)
AK BYTES
DATA SEGMENT (05)
‘os navies
Cc.
sricx seouene (69)
oak
inputioureut_ | 64K
ADDRESS SPACE
EXTRA SEGMENT (ES)
FFFF
DATA SEGMENT (FS)
G4 KBYTES
DaTASEOMENT (68)
‘oak eyTES.
FFFEFi6
Fig. 1.6 Real mode programming model for Pentium processorMicroprocessors and Microcontrollers
cs
ss
ps
ES
FS
cs
‘Segment Registers
1-22 Intro. to Pentium Microprocessor
FFFFFH
Code Segment
Stack Segment
Data Segment
Data Segment
Data Sagment
Data Segment
(000004
Fig. 1.7 Active segments of memory
Paging mechanism in Pentium is not active in the real mode. Thus, in real mode the
linear addresses are the same as physical addresses. Physical addresses are generated in
Real Mode by adding the contents of the appropriate segment register which are shifted
left by 4 bits to an effective
address. If there is a carry
generated after addition of
shifted segment register
contents and effective address,
unlike 8086, resulting 21-bit
address is a linear address. This
means that in 8086, the carried
bit is truncated, whereas in
Pentium the carried bit is
stored as bit 20 of the linear
address. Fig. 18 shows the real
address mode —_address
formation and the 21-bit
address formation when carry
is generated.
4 Bit shifted 16 Bit Segment Selector
19 3 0
NIENRESEES REET REEEN CCH
* 16 Bit Effective Address
19
15 oO
21 Bit Linear Address
2119 3 0
- fee Tan an ae lan
cary it
Fig. 1.8 Real address mode addressingMicroprocessors and Microcontrollers 1-23 Intro. to Pentium Microprocessor
All segments in Real mode
FFFFFH
are maximum 64K bytes long.
These segments may be read, [|
written, or executed. The fe |
Pentium generates general
protection (interupt 13) OsTADS |
exception, if effective address is cove cs C= +4 [>]
beyond legal range from 0 to
FFFFH. STACK SS
All segment registers are Ls]
eccasibe fo the programmers, O7MAES
So programmer can store roarars[ «4
values in the segment registers
adjacent, disjointed, or even
overlapping. Fig. 1.9 shows all
possible ways of defining cok
segments in the memory. For
example, segments A and B are Fig. 1.9 Contiguous, adjacent, disjointed and
contiguous, whereas segments overlapping segments
B and C are overlapping.
1.5.3 Handling Interrupts and Exceptions in Real Mode
The Pentium supports Real Mode interrupts and exceptions much like the 8086. In
Pentium, addresses from 0 through 3FFH (400H memory locations) are dedicated for
Interrupt Descriptor Table (IDT) after Reset. This table contains pointers that define the
starting point of the interrupt service routines. Each pointer in the table requires four bytes
of memory. Thus, it contains upto 256 (4x 256 = 1024 = 400H ) interrupt pointers. Four
bytes in each pointer represent two words. The word having higher memory address holds
the segment base address, whereas the word having lower memory address holds offset.
Fig. 1.10 shows the Interrupt Descriptor Table (IDT). Like 8086, interrupts are recognized
by their numbers/types. Each time when interrupt occurs, Pentium multiplies interrupt
number/type by four to generate an index into the interrupt descriptor table.
In Pentium, the Interrupt Descriptor Table is relocatable. The base address of interrupt
descriptor table is present in the IDTR (Interrupt Descriptor Table Register ). The
programmer can change this address by loading different address in the IDTR. This is
possible using LIDT instruction. The LIDT instruction allows the relocation of base address
and it also used to specify the size of the IDT. If an interrupt occurs and the
corresponding entry in the interrupt table is beyond the limit stored in IDTR, a general
protection fault (exception 8) will occur. Table 1.7 (see on next page) summarises Pentium
Real Address Mode exceptions.Microprocessors and Microcontrollers 1-24
Memory
Gate for
interrupt # n
Gate for
interrupt # n—1
Cate or
interrupt # 4
cpu.
[or ume}
Gate for
Interrupt # 0
Intro. to Pentium Microprocessor
16 0
Segment Base word 1
memory
{some
address
IOTR x x
Fig. 1.10 Interrupt descriptor table
Interrupt Cause of Exception Description
Number
0 DWV, IDIV Divide error
1 All Debug exceptions
3 INT Breakpoint
4 INTO Overfiow
5 BOUND Bounds check
6 ‘Any undefined opcode or LOCK used | Invalid opcode
with wrong instruction
7 ESC or WAIT Coprocessor not available
8 INT vector is not within IDTR limit Interrupt table limit too small
ott Reserved
12 Memory operand crosses offset 0 or | Stack fault
OFFFFH
13 Memory operand roses offset OFFFFH | Pseudo-protection exception
or attempt to execute past offset
OFFFFH or instruction longer than 15
bytes
14,15 Reserved
16 ESC or WAIT Coprocessor error
0-255 | INT ‘Two-byte software interrupt
Table 1.7 Pentium real-address
mode exceptionsMicroprocessors and Microcontrollers __1-25 Intro. to Pentium Microprocessor
Note 1: Some debug exceptions point to the faulting instruction, others to the next
instruction. By examining the contents of DR6, it is possible to determine whether the
debug is pointing to the faulting instruction or to the next instruction
Note 2 : The coprocessor errors are reported on the first ESC or WAIT instruction after
the ESC instruction that caused the error.
1.6 Pentium RISC Features
Because of the advances in microelectronic manufacturing technology, a number of
changes in the computer architectures are taking place from the last decade. It became
possible to cram a large logic into the small space of silicon wafer. The new computers
were designed which use processors with complex instructions and addressing modes,
which we call as Complex Instruction Set Computer (CISC). But the problem arised with
the CISC machines was their instructions required multiple clock cycles to execute because
of cramming of large logic into a single package. This degraded the performance of CISC
machines. This problem is solved by a new design technique called Reduced Instruction
Set Computer (RISC). The important factor considered while designing RISC machines is
that it uses fewer instructions and simpler addressing modes. Because of the fewer
instructions, the number of operations are reduced and can easily be implemented on
silicon wafer which results in increase in the speed and hence improves the performance.
In this section, we will discuss the features of RISC processor, which of them are
applied to design Pentium processor.
1, Reduced accesses to main memory
Ideally, computer memory should be fast, large and inexpensive. Unfortunately, it is
impossible to meet all the three of these requirements simultaneously. Increased speed and
size are achieved at increased cost. Very fast memory of system can be achieved if SRAM
chips are used. These chips are expensive and for the cost reason it is impracticable to
build a large main memory using SRAM chips. The only alternative is to use DRAM chips
for large main memiories. Processor fetches the code and data from the main memory to
execute the program. The DRAMs which form the main memory are slower devices. So it
is necessary to insert wait states in memory read/write cycles. This reduces the speed of
execution. Thus, though the great advances are made in memory technology, processors
are much faster than memories. Since the speed of operation of processor is much faster
than that of memory, the processor has to wait during each memory access.
The RISC design includes a technique which reduces the number of accesses to main
memory. Most of the computer programs work with only small sections of code and data
at a particular time. In the memory system small section of SRAM is added along with
main memory, referred to as cache memory. The program which is to be executed is
loaded in the main memory, but the part of program (code) and data that work at a
particular time is usually accessed from the cache memory. This is accomplished by
loading the active part of code and data from main memory to cache memory. WheneverMicroprocessors and Microcontrollers 1-26 Intro. to Pentium Microprocessor
the processor tries to read data from main memory, the cache is examined first. The
addresses are stored in the caches. If one of these addresses matches the address being
used for the memory read, the cache will supply the data, which is called caché hit.
Generally, cache is ten times faster than the main memory. When the required data is not
found in the cache, it is called cache miss and the processor has to access main memory in
this situation. After a cache miss; a copy of the new data is written into the cache, so that
the data will be obtained whenever needed,
Pentium contains two caches, 8 KB each. An 8 KB instruction cache stores frequently
used instructions and an 8 KB data cache stores frequently used data. Initially, each cache
is empty and is filled as program executed.
2. Simple instructions and addressing modes
(RISC feature, not available in Pentium)
When the processor uses simple and fewer instructions and addressing modes, the
implementation of operations on silicon wafer is easier. Also it reduces the complexity of
the instruction decoder, the addressing unit and the execution unit. In this case, the
machine can be operated at a higher clock rate because work which is to be done in each
clock period is less. Practically, it is possible to use simple, fewer instructions and
addressing mode because after a research, the computer scientists came to know that the
programmers use only a small subset of the instructions available on the processor they
are using.
From this point of view, Pentium is not a RISC processor, but it is a CISC processor.
The reason is, Pentium should remain compatible with the installed software of entire
80x 86 family. Each and every instruction and addressing mode of the previous processor,
80486 should be kept as it is.
3. Large sets of registers and make good use of them
In the first feature, we have seen how the number of accesses to the memory affects
the performance of a processor. Similar to this, the number of registers available in
processor can affect the performance of it. When a complex calculation is to be performed
by a processor, it may require the use of several data values. If all these data values are
stored in a memory, then during the calculations, a number of memory accesses are
required to use those data values. But when number of registers are available in processor,
instead of storing data values in memory, they can be stored in registers. Accessing the
internal registers for reading data values during calculations is much faster thar accessing
memory for the same purpose. Thus, it is always good to have large sets of internal
registers for the processor.Microprocessors and Microcontrollers 1-27 Intro. to Pentium Microprocessor
Pentium has the following sets of registers.
i) Seven general purpose registers, all of them are 32-bits wide.
ii) Six segment registers, all of them are 16-bits wide.
iii) A 32-bit stack pointer.
iv) Eight floating-point registers, all of them are 80-bits wide.
Thus, the pentium has a large set of registers (like a RISC).
4, Pipelining
We know that more than one clock cycles are involved in the instruction cycle. These
clock cycles are required to perform various steps in the instruction execution. These steps
belong, to various processing stages in the instruction cycle. These are
= S, - Fetch (F) : Read instruction from the memory.
= S,- Decode (D) : Decode the opcode and fetch source operand (s) if necessary.
= S,~ Execute (£) : Perform the operation specified by the instruction.
= S,- Store (S) : Store the result in the destination.
Usually, instruction is executed by performing above mentioned stages one after the
other. When these stages for several instructions are performed simultaneously to reduce
overall processing time, the processing is called instruction pipelining.
Refer Fig, 1.11. Here, instruction processing is divided into four stages hence it is
known as four-stage instruction pipeline. With this subdivision and assuming equal
duration for each stage we can reduce the execution time for 4 instructions from 16 time
‘units to 7 time units.
Clock
cycle tt2]fsi4f[slo]7]e
Instruction
4 Fy} Or] Er | St
i Fz | Oe | E2 | S
Ig Fa | Ds | Es | So
My Fa | Dy | Es | Se
Fig. 1.11 Four stage Instruction pipeliningMicroprocessors and Microcontrollers 1-28 Intro. to Pentium Microprocessor
In this instruction pipelining four instructions are in progress at any given time. This
means that four distinct hardware units are needed, as shown in Fig. 1.12. These units are
implemented such that they are capable of performing their tasks simultaneously and
without interfering with one another. Information from the stage is passed to the next
stage with the help of buffers.
interstage butters
D
F E s
Decode instruction
Fetch and fetch fe] Fm] Execution L-| Store
instruction operands ‘operation result
B, 8, By
Fig. 1.12 Hardware organisation for four-stage instruction pipeline
Coming to the point of performance analysis we can say that pipelining can reduce
effective number of clock cycles required for instruction execution and thus increases the
rate of executing instructions significantly. It approaches the ideal value of required clock
cycles per instructions as shown in Fig. 1.11. However in practice, this ideal value cannot
be attained for a variety of reasons.
The performance of a processor improves tremendously because of pipelining. There
are two types of pipelines in Pentium, instruction pipelines (U and V, covered in further
section) and bus cycle pipeline that performs special types of bus cycles. The instruction
pipelines include five stages. They operate independently
Also, Pentium employs a branch prediction technique (explained in detail in further
section). Normally, there is a flow of instructions through U and V pipelines. With a
branch prediction technique, Pentium predicts whether to change normal program flow or
not. Thus this technique helps to keep a steady stream of instructions flowing into the
pipelines. Of course, this increases the rate of instruction execution, and hence the
performance of the Pentium improves. This feature of Pentium is very like a RISC
machine.
5, Extensive utilization of the compiler
When a program is written in higher level language (e.g. C language), during
compilation, each statement within a program is converted into assembly language
instruction. When we use a Pentium compiler, the advances in the Pentium architecture
can be utilized with the optimizations on the assembly language code. Some examples of it
are given on next page.Microprocessors and Microcontrollers 4-29 Intro. to Pentium Microprocessor
a) Arrange some pairs of instructions such that they will execute in parallel in the
floating-point unit or dual-integer pipelines.
b) Reorder the instructions such that the Pentium’s branch prediction technique is
utilized properly.
c) If possible, replace an instruction with an equivalent instruction which requires
lesser number of clock cycles or the number of bytes of machine code. For
example, MOV EAX, 0 can be replaced by SUB EAX, EAX.
d) Use the instruction/data cache or algorithms to allocate the minimum number of
processor registers during parcing of an arithmetic statement.
Thus, a properly written Pentium compiler helps to achieve a high performance like in
a RISC or CISC machine.
From all above discussion, it is clear that the Pentium contains both RISC and CISC
characteristics.
1.7 Pentium Super-scalar Architecture
“Processors capable of parallel instruction execution of multiple instructions are known
as superscalar processors. The Pentium is capable, under special circumstances, of
executing two integer or two floating point instructions simultaneously and thus it support
superscaler architecture. However, there are restrictions placed on a pair of integer
instructions attempting parallel execution. These restrictions are discussed in section 1.9.
For floating point instructions there is a restriction of which instructions should
execute as a first instruction of a pair and which is the second instruction of the pair.
First instruction in the pair | Second instruction in the pair
FLD FXCH -
FLD ST (i), FADD, FSUB
FMUL, FDIV, FCOM, FUCOM
FTST, FABS, FCHS
The modern compilers play the important role in achieving the performance of
Pentium processor at superscalar level. They do the ordering of the instructions during
code generation to make pair of instructions without any data dependency and make
allowable combinations of integer and floating-point instructions for simultaneous
execution’Microprocessors and Microcontrollers 41-30 Intro. to Pentium Microprocessor
1.8 Pipelining
In the previous section we have seen that the rate of instruction execution can be
improved with a pipelining. In Pentium, there are two instruction pipelines, U pipeline
and V pipeline. These are five-stage pipelines and operate independently. These five stages
with their order are as follow:
1. PF Prefetch
2. DI Instruction Decode
3. D2 Address Generate
4. EX Execute, Cache and ALU Access
5. WB Writeback
yt and gf an sth
Instruction Data
aust oan PF -—=} pt -—=| 02 4] EX =| we stream
Fig. 1.13 Stages in U and V instruction pipelines
Both pipelines U and V include the above five stages. The U pipeline can execute any
processor instruction, but the V pipeline only executes simple instructions. An instruction,
which does not require microcode control to execute and generally takes one clock cycle to
complete is referred to as a simple instruction. For example, register-to-register MOVs,
INC, DEC, near conditional jumps (e.g. JZ, JNZ etc.). It is to be noted that some simple
instructions may take two or three clock cycles. These are arithmetic and logical
instructions that use both register and memory operands.
Refer Fig. 1.14 which shows the pipelined instruction execution.
u Vv U v u Vv U Vv U v
pr [iu] ie | 13 «| 6 [Ds [ v | 6 19 | 110
r
D1 n | 2 | 13 | 4 [ 15 16 [ 17 18
we n | 2
yas 1 2 3 4 5
Fig. 1.14 Pipelined instruction executionMicroprocessors and Microcontrollers 1-31 Intro. to Pentium Microprocessor
The following sequence of steps explains the pipelined instruction execution in
Pentium.
1. Prefetch (PF) stage : Instructions are prefetched from the instruction cache or
memory and fed into the PF stage of both the pipelines U and V.
2. Instruction Decode (D1) stage : In this stage, decoder in each pipeline checks if
the current pair of instructions can execute together. If the instruction contains a
prefix byte, an additional clock cycle is required in this stage. Also, such an
instruction may only execute in the U pipeline and may not be paired with any
other instruction.
3. Address Generate (D2) stage : In this stage, the addresses for the operands that
reside in memory are calculated.
4, Execute (EX) stage : In this stage, operands are read from the data cache or
memory and ALU operations are performed. Also, branch predictions for
instructions (except conditional branches in the V pipeline) are verified in this
stage.
5. Writeback (WB) stage : This is the final stage. In this stage, the results of the
completed instructions are written and the conditional branch instruction
predictions are verified.
When both the instructions from pipelines U and V reach the EX state, this may
happen that one of them will stall and require additional clock cycles for the execution. No
work is done during the stall. So the pipeline stall lowers performance. There are various
situations when the instructions stall. For example, when the operands required for the
operation are not found in the data cache. If the instruction in the U pipeline stalls, the
instruction in the V pipeline also stalls. But if the instruction in the V pipeline stalls, the
instruction in the U pipeline may continue executing. The instructions in the both pipelines
must reach to the last stage, WB before another pair of instructions or the next single
instruction may enter the EX stage.
1.9 Instruction P ig Rules
The Pentium processor can issue one or two instructions every clock. In order to issue
two instructions simultaneously they must satisfy the following conditions :
= Both instructions in the pair must be “simple” instructions.
Simple instructions are entirely hardwired; they do not require any microcode
control and, in general, execute in one clock. Examples of simple instructions are
register-to-register MOVs, INC, DEC and near conditional jumps (JZ, JNZ, etc.).
There is one more restriction to conditional jump instruction; it must be the second
instruction in the pair. The arithmetic and logical instructions are also simple
instructions; however, they may take two or three clock cycles because these
instructions use both register and memory operands. Sequencing hardware is usedMicroprocessors and Microcontrollers __1-32 Intro. to Pentium Microprocessor.
to allow them to function as simple instructions. The following integer instructions
are considered simple and may be paired :
1. MOV reg, reg/mem/imm
MOV mem, reg/imm
ALU reg, reg/mem/imm
ALU mem, reg/imm
INC reg/mem
DEC reg/mem
PUSH reg/mem
POP reg,
LEA reg, mem
10. JMP/CALL/JCC near
11. NOP
Shifts or rotates can only pair in the U pipe.
(SHL, SHR, SAL, SAR, ROL, ROR, RCL or RCR)
ADC and SBB can only p:
SPN ane wD
in the U pipe.
JMP, CALL and Jcc can only pair in the V pipe. (Jcc = jump on condition code).
Neither instruction can contain BOTH a displacement and an immediate
operand.
For example :
mov [bx12], 3; 2 is a displacement, 3 is immediate
mov meml, 4 + meml is a displacement, 4 is immediate
Prefixed instructions can only pair in the U pipe. Prefixed instructions (such as
MOV, AL; ES : [DI]) may only execute in the U pipeline. Therefore, only one
prefixed instruction in the pair is allowed.
The U pipe instruction must be only 1 byte in length or it will not pair until the
second time it executes from the cache.
There should not be any data dependencies between them.
Data Dependency : The data dependency between two instructions exists if :
The result of the first instruction is an operand for the second instruction
(read-after-write dependency). That is we can not read the operand from register
for the second instruction until first instruction writes its result in the register.Microprocessors and Microcontrollers 4-33 Intro. to Pentium Microprocessor
There can be no read-after-write or write-after write register dependencies between
the instructions except for special cases for the flags register and: the stack pointer
mov ebx, 2 + writes to EBX
add ecx, ebx ; reads EBX and ECX, writes to ECX
; EBX is read after being written, no pairing
mov ebx, 1 i writes to EBX
mov ebx, 2; writes to EBX
; write after write, no pairing
The flags register exception allows an ALU instruction to be paired with a Jcc even
though the ALU instruction writes the flags and Jec reads the flags.
For example :
cmp al, 0 ; CMP modifies the flags
je addr ; JE reads the flags, but pairs
dec cx ; DEC modifies the flags
jnz_— loop] ; gNZ reads the flags, but pairs
The stack pointer exception allows two PUSHes or two POPs to be paired even
though they both read and write to the SP (or ESP) register.
push eax : ESP is read and modified
push ebx ; ESP is read and modified, but still pairs
1.10 Branch Prediction
We have seen that pipelined instruction execution is a valid technique for improving
instruction execution rate and hence the performance of a processor. But it reduces when
there is a presence of program transfer instructions such as JMP, CALL, RET, the
conditional branch instructions etc. in the instruction stream. When the pipelined
instruction execution technique is used, the instruction pipeline is always filled with a
group of instructions stored in sequential memory locations. But when program transfer
instruction is present, it changes the normal sequence of execution. So all the instructions
that entered the pipeline after this instructions become incorrect. In this case, the
instructions which come in the sequence because of the execution of branch instruction
should be loaded in the pipeline. The incorrect instructions that loaded wrongly, must be
discarded, This is called ‘flushing’ of the pipeline. After flushing, a new sequence of
instructions which is correct because of éxecution of branch instruction, is loaded in the
pipeline. No work is done when the pipeline stages are reloaded. These disturbances in the
pipelined instruction execution are called ‘bubbles’.
Pentium overcomes this problem by using a technique called ‘dynamic branch
prediction’. The branch is to be taken or not taken, is decided by prediction. If the
prediction is true, the pipeline will not be flushed, no cycles will be lost and no bubbles in
the pipeline. If the prediction is false, the pipeline is flushed. So the cycles will be wasted
and this causes bubbles in the pipeline. The pipeline is loaded with the correct group of
instructions. Naturally, it is best if the predictions are true most of the time. Pentium usesMicroprocessors and Microcontrollers. 1-34 Intro. to Pentium Microprocessor
a branch target buffer (BTB) for dynamic branch prediction. BTB is a special cache which
stores the branch instruction that occurs in the instruction stream and target addresses of
it. BTB also stores two history bits which indicate the execution history of the last two
branch instructions. BTB uses the history bits to predict whether the branch is taken or not
taken. When a new target address is placed into the BIB, these history bits are set to 11.
When the corresponding branch instruction is present, the history bits are updated. The
history bits become 00 if there are repeated failures to take a branch. Here, the prediction
becomes ‘rot taken’, Fig. 1.15 shows the operations take place in dynamic branch
prediction technique.
aT BNT
New branch instructions
start here
BNT
Fig. 1.15 Operations in dynamic branch prediction technique
Note : 1. Each state is represented by history bits, H and prection, P.
2. Prediction is either ‘branch is taken’ indicated by BT or ‘branch is not taken’ indicated
by BNT.
The prediction will be taken until the history bits become both zero."The BTB is
accessed during the D1 stage of U and V pipelines. For a new branch instruction, there is
no target address in the BTB and the prediction is not taken. There are two. 32-byte
buffers. One buffer prefetches instructions from the current program address and the other
buffer prefetches instructions from the target address when the BTB’s prediction is ‘branch
taken (BT)’. If the predictions are correct, clock cycles are not wasted. If the predictions are
incorrect or predictions are correct, but the target address is wrong, the pipelines will be
flushed. This looses three clock cycles in the U pipeline and four clock cycles in the V
pipeline.
Most of the times, we use conditional jumps to form loops in programs. The
prediction, ‘branch is taken (BT)’ forms the required multiple passes through a loop. In
Pentium, the history bits are set to 11 for a new entry. So, using the dynamic branch
prediction, the wastage of clock cycles is minimised.Microprocessors and Microcontrollers___1-35 Intro. to Pentium Microprocessor
1.11 The Instruction and Data Caches
In this section, we will see the concept of cache memory, advantages of using caches
and Pentium cache organisation.
1.11.1 Cache Memory
In a computer system the program which is to be executed is loaded in the main
memory (DRAM). Processor then fetches the code and data from the main memory to
execute the program. The DRAMs which form the main memory are slower devices. So it
is necessary to insert wait states in memory read/write cycles. This reduces the speed of
execution. To speed up the process, high speed memories such as SRAMs must, be used.
But considering the cost and space required for SRAMS, it is not desirable to use SRAMs
to form the main memory. The solution for this problem is come out with the fact that
most of the microcomputer programs work with only small sections of code and data at a
particular time. In the memory system small section of SRAM is added along with main
memory, referred to as cache memory. The program which is to be executed is loaded in
the main memory, but the part of program (code) and data that work at a particular time
is usually accessed from the cache memory. This is accomplished by loading the active
part of code and data from main memory to cache memory. The cache controller looks
after this swapping between main memory and cache memory with the help of DMA
controller.
When processor finds the addressed code or data in the cache, it is called ‘cache hit’.
The percentage of accesses where the processor finds the code or data word it needs in the
cache memory is called the ‘hit rate’. It is given by,
, Number of hits .
Hit rate =U imbarof read / write bus cycles * 100%
The hit rate is normally greater than 90 percent. When the required code or data is not
found in the cache, it is called ‘cache miss’.
Thus, cache is a special type of high-speed RAM and is used to speed up accesses to
memory and reduce traffic on the processor’s buses. The advanced processors use the
on-chip cache to achieve high speed accesses to memory and hence the performance.
wmm> Example 1.1: The application program in a computer system with cache uses 1400
instruction acquisition bus cycle from cache memory and 100 from main memory. What is
the hit rate? If the cache memory operates with zero wait state and the main memory bus
cycels use three wait states, what is the average number of wait states experienced during
the program execution?
1400
aoa F100 * 100 = 93.3333%
Solution : Hit rate
Total wait states = 1400 x 0 + 100 x 3 = 300
Average wait states = Total wait states = 300 _ 99
verage wait states = Njo. of memory bus cycles 1500”Microprocessors and Microcontrollers 1:36 Intro. to Pentium Microprocessor
1.11.2 Two Level Cache System
We know that, the on-chip cache is a high-speedcache. But its size is limited by space
constraints, Therefore to design a high performance system secondary cache is used along
with the primary on-chip cache, called external cache. Such system is called two level
cache system and in such systems secondary cache is constructed with SRAM chips.
Fig. 1.16 shows a two level cache system in a microcomputer
As shown in Fig. 1.16, an on-chip cache supplies instructions and data to the CPU's
pipeline. When a code or data is required from memory, the processor first searches it in
the on-chip cache (internal cache). If it is found in the internal cache (a cache hit), a copy
of it is sent to the pipeline very fastly. Usually, it takes just a clock cycle. If it is not found
in the internal cache (a cache miss), the processor examines an external cache (a second
level cache). If a cache miss occurs at an external cache or there is no external cache, the
processor accesses the main memory. The processor writes the copy of code or data to the
cache from main memory.
External cache
(secondary)
On-chip cache
(primary)
Instructions
(codes)
Pipeline
Fig. 1.16 Two-level cache system in a microcomputer
The secondary cache is much slower than the primary cache. But the size of secondary
cache is large which ensures a*high hit rate. The secondary cache thus reduces the impact
of the main memory speed on the performance of a computer. The average access time
experienced by the CPU in a two level cache system is
tay = Rita: +(1-hi)hotar +(1—hi)(1-ho) ta
where ty) is the access time and hy is the hit rate of Ly
tho is the access time and hy is the high rate of Lz
t, is access time of main memory.
The number of hits in the secondary cache is given by the term (1-h1)h2 and -the
number of misses in the secondary cache is given by the term (1—h;) (1—h2).
A ‘hit-ratio’ specifies the percentage of hits to total cache accesses. If the hit ratio is 0.9,
then it means that the cache contains the requested information nine times out of ten.
" Thus, the average access time depends on the hit ratio. The average access time is given
by,
Tayg = hit-ratio * Teache + (1 — hit-ratio) * (Teache + Tram )Microprocessors and Microcontrollers 1-37 Intro. to Pentium Microprocessor
11.1.3 Pentium Cache Organisation
Pentium processor provides separate caches for data and code. Both caches are
organized as two-way set-associative caches with 128 sets. This gives 256 entries per cache.
There are 32 bytes in a line (64 bytes per set), resulting in 8 KB of storage per cache. The
data and instruction caches may be accessed simultaneously.
The Fig. 1.17 shows the internal structure of instruction and data cache. As mentioned
earlier, it conists of 128 sets of two lines each. Each line is associated with a tag. The tags
are triple ported, meaning that they can be accessed from three different places at the
same time. Two of these ports are the U and V pipelines, which access the data cache to
read/write instruction operands. The third port is used for a special operation called bus
snooping, The code cache tags are also triple ported to support snooping and split line
accesses. [The snooping is used to maintain consistent data in a multiprocessor system
where each processor has a separate cache]. .
Set 126
Set 127
fe— 32bytes —+}-——— 32 bytes. ———+}
Fig. 1.17 Structure of 8 KB instruction and data cache
Each entry in the data cache can be configured for write back or write through. The
code cache is an inherently write-protected cache. It is write protected to prevent
self-modifying code from changing the execution program.
Each cache uses parity bits to maintain data integrity. Each tag is provided with one
parity bit. There is one parity bit for every eight bytes of data (a quarter of a line or entry)
in the instruction cache. .
In pentium, individual pages can be configured as cacheable or non-cacheable by
software or hardware. The caches can be enabled or disabled by software or hardware.Microprocessors and Microcontrollers 1-38 Intro. to Pentium Microprocessor
Translation Lookaside Buffers
Each cache has a dedicated Translation Lookaside Buffers (TLBs) to translate linear
addresses into physical addresses. Physical addresses are used to access the cache because
the same address is used to access main memory. The TLBs are also caches. The Table 1.8
gives the information of TLBs in the data cache and instruction cache.
TLB in data cache
TLB in instruction cache
2 TLBs
First : 4-way set associative 64]
entries. It translates addresses for 4|
KB pages of main memory.
Second : 4-way set associative with
8 entries. It translates addresses for 4}
MB pages of main memory.
Both TLBs are parity protected and|
dual ported.
178
4-way set-associative with 32 entries.
It translates both addresses for 4 KB
pages and 4 MB pages of main
memory.
4 MB pages are cached as block of|
4 KB each.
TLB is parity protected.
The cache convoller uses Least-Recently-Used (LRU) algorithm to replace entries in all
three TLBs. For that Pentium provides 3-bit LRU counter for each set.
Table 1.8 TLBs for data and instruction cacho
The Fig. 1.18 shows the overall cache organisation for Pentium processor.
Data cache
2away set
associative
BKB
(82 bytes «2 » 128)|
set associative
with 64 entries
deway
set associative
with 8 entries
ot
Instruction cache
2eway set
associative
aKa
(32 bytes x 2
set associative
with 32 entres
Fig. 1.18 Pentium cache organisationMicroprocessors and Microcontrollers__1-39 Intro. to Pentium Microprocessor
Translating Linear Address into Physical Addresses with a TLB
The Fig. 1.19 shows how TLB is used to translate linear address into. physical address.
The upper 20+bits of the linear address are checked against four tags and translated into
the upper 20-bits physical address in case of cache hit. The lower 12-bits of the physical
address are same as the lower 12-bits of linear address.
20 bits 12 bits
Tag of upper
20 bits of
linear
address.
4-way
set
associative
TLB
Upper 20 bits
of physical address 31
‘Actual 4KB page base address
20 bits 12 bits
Fig. 1.19 Generation of physical address from linear address
Cache Coherency in a Multiprocessor System
Cache updating systems eliminates data inconsistency in the main memory caused by
cache write operations. However, in multiprocessor systems, several processors require a
copy of the same memory block and they store a copy of the same memory block in their
individual caches. Now, if the processors are allowed to update the data in the cached
memory block in its individual cache, an inconsistent view of memory can result. This
problem is known as ‘cache coherence’ problem.
To avoid such inconsistency and to maintain cache coherency in its data cache
Pentium uses MESI (Modified/Exclusive/Shared/Invalid) Protocol. MESI protocol uses twoMicroprocessors and Microcontrollers __1-40 Intro. to Pentium Microprocessor
bits to keep information of the state of each cache line. The state of each cache line is
marked as modified, exclusive, shared or invalid. The meaning of each state in this
protocol is as given below.
= Modified : The line in the cache, different from main memory is modified and
this line is available only in this cache.
= Exclusive : The line in the cache is same as that in main memory and it is not
present in any other cache.
= Shared: The line in the cache is same as that in main memory and the same
line may be present in one or more other caches.
= Invalid: The line in the cache does not contain valid data.
1.12 Floating Point Unit
In 8086, 80286 and 80386, floating point operations were performed with the help of
external coprocessors. Table 1.9 gives the list of coprocessors used with 80 x 86 family.
Processor | Coprocessor
(80 x 86 family) | (80 x 87 family) -
8086/88 8087
80286 | 80287
80386 80387
1.9 Processors and Coprocessors
The 80x87 coprocessor, when used with 80x86 processor, shared address bus, data bus
and control bus with the processor. A considerable time is required for the synchronization
between the processor and the coprocessor to perform floating point operations. This
problem was solved by placing coprocessor on the processor chip. This was done for 80486
and Pentium. Since the coprocessor is on the same chip as the processor, communication is
faster and execution takes place quickly. Thus, there is an internal floating point unit
(FPU) for 80486 and Pentium. The Pentium contains an improved, totally redesigned FPU
over that used in the 80486.
The number of clock cycles required for many floating point instructions with 80x87
coprocessor units are reduced to few clock cycles in 80486 and Pentium. Also the new
algorithms increase the speed of floating point operations. Consider an example of a
floating-point multiply instruction, FMUL. Table 1.10 shows the number of clock cycles
required for the execution of this instruction for different co-processors.Microprocessors and Microcontrollers 4-41 Intro. to Pentium Microprocessor
Coprocessor | Minimum clock
(80 x 87 family) | cycles required
8087 130
80287 130
£0387 29
80486 FPU 16
Pentium FPU 1
Table 1.10 FMUL instruction performance
Thus, for many floating point instructions, there is an improvement in each generation,
and highest improvement in the Pentium’s FPU. Such a high speed of FPU is achieved
using a pipeline. A FPU pipeline contains eight stages as shown in Fig. 1.20.
i
Instruction +
and
Fig. 1.20 Stages in FPU pipeline
As shown in Fig. 1.20, the first five stages are the ones that form the U pipeline, which
processes integer instructions. Only difference is in the fifth stage. In U pipeline the fifth
stage is WB (Writeback, as discussed earlier). In case of FPU pipeline, this fifth stage
becomes the first stage for the floating point execution. The FPU pipeline consists of these
five stages and extra three stages. Thus there are totally eight stages. All these stages and
their functions are explained below.
i) PF: Prefetch
ii) D1: Instruction decode
iii) D2: Address generation
iv) EX : Memory and register read, floating-point data converted into memory format,
memory write. The above stages are already explained in the section, pipelining,
v) X1-: Stage one in floating point execution. In this stage, memory data is converted
into floating-point format, operand is written to floating point register file. Using
bypass 1, data is sent back to EX stage. This allows a floating point register write
operation in the X1 stage to bypass the floating point register file. The result is
sent to the instruction performing a floating-point register read in the EX stage.
vi) X2 : Stage two in the floating-point execution.
vii) WE : Round floating-point result and write to floating-point register file. Bypass2
path is followed to send data back to EX stage. Using bypass 2, the result of anMicroprocessors and Microcontrollers. 1-42 Intro. to Pentium Microprocessor
arithmetic instruction in stage WF is made available to the next instruction fetching
operands in the EX stage.
viii) ER : Error reporting. The status word is updated.
There are eight 60-bit floating-point registers in the floating-point register file, ST(0)
through ST(7). Two read and two write operations can be performed simultaneously since
there are two ports in read section and two ports in write section. The data is written to
the two write ports from the X1 and WF stages of the pipeline.
Pentium’s FPU is designed such that fast floating-point execution can be achieved.
Review Questions
1. Briefly explain the historical evolution of microprocessors.
2. Explain important features of 80286 microprocessor.
3. Explain important features of 80386 microprocessor.
4, Explain important features of 80486 microprocessor.
5. Explain important features of Pentium microprocessor.
6.
7.
8
5 Explain the significant additions and enhancements in the Pentium processor.
. Draw and explain the block diagram of Pentium processor.
3. Explain any five Pentium processor signals,
9. Draw the programmer's model of Pentium in real Mode.
10. Give the maximum memory addresses available in real mode.
11. Describe how physical address is obtained in real Mode.
12, Write a short notes on
0) Pentium RISC features. b) Pentium super-scalar Architecture
13. What do you mean by simple instructions ?
14, What is data dependency ?
15, What is pipelining ?
16. Explain the pipelined instruction execution with the help of block diagram.
17. Explain the instruction pairing rules with the help of suitable examples.
18. What is branch prediction ?
19, Explain the dynamic branch prediction technique used in Pentium processors.
20. What is cache memory ?
21. Define hit rate.
2. Explain the two level cache system.
1. Draw and explain the internal structure of instruction and data cache of Pentium processor.
1. What is bus snooping ?
What is TLB ?
Write a short note on instruction and data cache in Pentium.
7. Draw and explain the Pentium cache organisation.
Explain, how linear address is converted into physical address using TLBs.
What is cache coherency ?
|. How Pentium maintains cache coherency ?
|. What is MESI protocol ?
Write o' short note on floating point unit of Pentium processor.
RESB8SSRRREB
Q00Bus Cycles and Memory
and I/O Organisation
2.1 Introduction
This chapter gives information about bus cycles and memory and I/O organisation of
pentium processor. We begin with the explanation of RESET operation.
2.2 RESET Operation
When reset pin of pentium is activated BIST (Build-in Self-Test) for pentium is
initiated. The- BIST tests 70 percent of the internal structure of the Pentium in
approximately 150 us. Like 80486, in Pentium also the test report is stored in EAX register.
The test is OK, if EAX is zero. The value of EAX can be tested after a reset to determine if
an error is detected. The table 2.1 shows the values in the various registers of Pentium
Processor after reset.
Register Reset Value
EAK 0 (if test passes)
EDX 0500XXXKH
EBX, ECX, ESP, EBP, ESI and EDI
0
EFLAGS, 2
EIP COOOFFFOH
cs FOOOH
DS, ES, FS, GS and SS 0
GDTR and TSS 0
cRO 60000010H
CR2, CR3 and CR4
o
DRO-DR3 o
DRE FFEFFOFFOH
DR7 00000400H
Table 2.1 Register values in Pentium processor after reset
(2-1)Microprocessors and Microcontrollers 2-2 Bus Cycles & Memory & 1/0 Org.
2.3 Bus Operations and Bus Cycles
The pentium processor performs following different operations over its address and
data bases :
= Data transfers (both single cycle and burst transfers).
= Interrupt acknowledge cycles.
= Inquire cycle for examining the internal code and data cache.
= 1/O cycle.
In this section, we are going to discuss the basic operations and the purpose of these
bus cycles.
The current bus cycle in the pentium processor is decided by the state of M/IO, D/C,
W/R, CACHE and KEN signals. This is illstrated in Table 2.2.
M/IO | DIC | WIR |CACHE| KEN Cycle description
0 ° ° 1 x__|_ Interrupt acknowledge
0 0 1 1 x Special cycle
0 1 o 1 x__| 0 read non-cached
° 1 1 1 x__| wo wrte non-cached
1 ° ° 1 x__| Code reed 8 bytes non-cached
1 ° ° x 1 _| Code read 8 bytes non-cachad
1 0 o o 0 | Code read 32 bytes burst cached
1 1 0 1 x Memory read up to 8 bytes non-cached
1 1 0 x 1 Memory read up to 8 bytes non-cached
1 1 ° 0 0 _| Memory read 32 bytes burst cached
1 1 1 1 x Memory write up to 8 bytes non-cached
1 1 1 oO x 32 byte cache write back burst
Note : X = don't care
Table 2.2 Bus Cycle Encodings
Additional decoding is required to indicate special bus cycles. The byte enable outputs
decide the currently running special bus cycle, as shown in the Table 2.3.
BE, BE. BE BE, BE BE BE Special Bus Cycle
1 1 1 1 1 1 ° Flush cache
1 1 1 1 1 o 1 Halt
| Be
Tp to te snaoun
Ls
1
[4
WritebackMicroprocessors and Microcontrollers ___2-3 Bus Cycles & Memory & /0 Org.
+[i [a 0 1 1 1 1_| Flush acknowledge
oa tre Po tr a oT ere eve mosses
Table 2.3 Special bus cycles
2.4 Bus Cycle States
The state of pentium bus cycle is depend upon the type bus cycle is being processed.
There are six possible states for pentium bus cycle. These are :
Tj (Idle state) : After hardware reset pentium bus is in idle state. In this state, no bus
cycle is currently running.
T; (First state) : This is the first state of the bus cycle. During T,, a valid address is
output on the address lines and ADS is activated.
T, (Second state) : This is the second state of the bus cycle. During T;, data is read or
written and the BRDY input is examined.
Tua : It indicates the overlapping period of first and second states. This state exists
when a second bus cycle starts before the first one completes. The data for the first bus
cycle is transferred, and a new address is output on the address lines.
Tp : State inserts a dead state between two consecutive cycles.
The Fig. 2.1 shows the state transition diagram for pentium processor.
No
bus cycle
request
Bus cycle
complete
Go for
Go back if new bus cycle
new request
is pending
Current
cycle
completed
No deadiook lock needed New bus
needed cycle is
Deadlock needed after | Not pending
‘completion of current
Deadlock needed after
completion of current cycle
Fig. 2.1 State transition diagram for pentium processorMicroprocessors and Microcontrollers __2-4 Bus Cycles & Memory & 1/0 Org.
2.5 Non-Pipelined Bus Cycles
Fig. 2.2 Typical nonpipelined bus cycle
Fig. 22 shows typical nonpipelined bus cycle. During T1, the pentium sends the
address, bus status signal and control signals. In case of write cycle, data to be output is
also send on the data bus, during TI. As shown in the figure, after address access time
read or write data transfer takes place over the data bus. This activity is carried out in T2.
2.5.1 Non-pipelined Read Cycle
Fig. 23 (please see on next page) shows the timings for two nonpipelined read cycles
(with and without a wait state). First read cycle is without wait state and second cycle is
with wait state. The sequence of events for the nonpipelined read cycle is as follows :
The read operation starts at the beginning of phase in the T1 state of the bus cycle. In
this phase, Pentium sends the address on the address bus and enables signals according to
data transfer type. After sending the address, in the same phase, Pentium activates its ADS
signal to indicate valid address is placed on the address bus. In phase 1 of Ti-state
Pentium also activates the bus cycle definition signals. For read cycle W/R is low. M/IO is
high for memory read and low for an I/O read. D/C signal differentiate between data and
instruction code. This signal is high if data is to be read and low if an instruction code is
to be read. At the end of phase 2 of Tl-state, ADS is returned to its inactive logic 1 state.
The address bus, byte enable pins, and bus status pins remain active through the end of
the read cycle.
At the end of phase 2 of T2-state the BRDY signal is sampled by the Pentium. The
Pentium. The logic 1 on this signal inserts wait state in the current bus cycle to extend the
bus cycle. In wait state (Tw), the signals from T2-state are maintained throughout the wait
state period. It just a repetition of T2-state. Thus the period of one wait state (Tw = T2) is
equal to 50ns of 20 MHz clock operation.Microprocessors and Microcontrollers 25 Bus Cycles & Memory & I/O Org.
CYCLE 1 cYcLe2
IDLE | NON-PIPELINED| —_NON-PIPELINED IDLE
(READ) (READ)
cuk
‘Address
wR
ADS
ROY
Fig, 2.3 Non pipelined read cycle
2.5.2. Non-pipelined Write Cycle
Fig. 2.4 (please see on next page) shows the timings for two nonpipelined write cycles
(with and without a wait state) first write cycle is without wait state and second cycle is
with wait state. The sequence of events for the nonpipelined write cycle is as follows :
= The nonpipelined write cycle is similar to nonpipelined read cycle. The write
operation starts at the beginning of phase 1 in the T1 state of the bus cycle. In this
phase, Pentium sends the address on the address bus and enables signals according
to data transfer type. After sending address in the same phase, Pentium activates
its ADS signal to indicate valid address is placed on the address bus. In phase 1 of
Ti-state Pentium also activates the bus cycle definition signals. For write cycle,
W/R is high. M/IO is high for memory and low for I/O write. D/C signal is
high.
= At the beginning of phase 2 of Ti-state, Pentium sends data on the data bus. This
data remains valid until the start of phase 2 of the Tl-state of the next bus cycle.Microprocessors and Microcontrollers ___2-6 Bus Cycles & Memory & /0 Org.
cycte 1 CYCLE 2
IOLE |NON-PIPELINED] —NON-PIPELINED | IDLE,
(WRITE) (WRITE)
cK
Address
Fig. 2.4 Nor-pipelined write cycle
= At the end of phase 2 of T1 - state, ADS is returned to its inactive logic 1 states.
The address bus, byte enable pins, and bus status pins remain active through the
end of the write cycle.
At the end of phase 2 of T2-state the BRDY signal is sampled by the Pentium. The
logic 1 on this signal inserts wait state in the current bus cycle to extend the bus cycle. In
wait state (Tw), the signals from T2-state are maintained throughout the wait state period.
It just a repetition of T2-state.
2.6 Pipelined Read/Write Cycle
As mentioned earlier, address pipelining allows bus cycles to be overlapped, increasing
the amount of time available for the memory or I/O device to respond. Fig. 6.10 shows
both nonpipelined and pipelined read and write cycles. The cycle’l and cycle 2 in the
diagram show nonpipelined write and read cycles, respectively, whereas cycle 3 and cycle
4 in the diagram show pipelined write and read cycles, respectively. This diagram also
shows how wait state can be avoided using pipelined bus cycle.
In the pipelined bus cycle the address for the next bus cycle is sent during the
T2 - state of the current cycle. In Pentium, NA (next address) signal initiates addressMicroprocessors and Microcontrollers 2-7. Bus Cycles & Memory & I/O Org.
pipelining. The Pentium samples NA signal at the beginning of phase 2 of any T state in
which ADS is not active, specifically.
= In the second T-state of a non-pipelined address cycle
= In the first T-state of a pipelined address cycle
= In any wait state of a non-pipelined address or pipelined address cycle unless NA
has already been sampled active.
In Fig. 2.5 NA is tested as 0 (active) during T2 of cycle 2 which ensures that Pentium
has to execute next cycle as pipelined bus cycle. The cycle 2 (nonpipelined read cycle) is
also extended with one wait state because BRDY pin is not active, in wait state, the valid
address for the next bus cycle is sent on the address bus as next bus cycle is pipelined bus
cyde.
oveues covets eyes | crea
WoLe NONMPELINeD nowHiiven | eiPetneD | Pein’ | ie
‘wre FAD TE EAD
VALIO4 VALID2
Fig. 2.5 Pipelined Read/Write Cycle
The next cycle (cycle 3) is pipelined write cycle. In this, data is sent on the data bus in
phase 2 of Tip-state and remains velid for the rest of the cycle. The BRDY signal is
sampled at the end of T2p - state. As it is low, write cycle is completed without wait state.
Fig. 2.5 shows NA is active during Tip of cycle 3, which ensures that Pentium has to
execute next cycle as pipelined bus cycle.
‘The next cycle (cycle 4) is pipelined read cycle. In this, BRDY signal is tested 0 at the
end of phase 2 of T2p - state. This means that read cycle is completed without wait state.
It is important to note that due to pipelined address cycle access time is extended and one
state (T-wait) of read cycle is saved.Microprocessors and Microcontrollers___2-8 Bus Cycles & Memory & 1/0 Org.
2.7, Burst Cycle
In Pentium, the memory data can be read using burst cycle. It is the most efficient
way of accessing data. The burst cycle in the Pentium transfers four 64-bit numbers per
burst cycle in five clocking periods. This is illustrated in Fig. 2.6. Therefore, a brust cycle
without wait states requires average ({15.15 ns x 5]/4) 1894 ns for each memory data
transfer.
Fig. 2.6 Burst cycle for Pentium processor
2.8 Memory Organisation
The memory system for the Pentium processor is 4 Gbytes in size, same as in the
80386DX and 80486 microprocessors. The only difference is in the width of the memory
data bus. The Pentium uses a 64-bit data bus to address memory organised in eight banks
that each contain 512 Mbytes of data. This is illustrated in Fig. 2.7. As shown in the
Fig. 2.7, the pentium memory system is divided into eight banks that each store a byte of
data with a parity bit. The memory system has 4 Gbytes memory, beginning of location
00000000H and ending at location FFFFFFFFH. The Bank selection is accomplished by bank
enable signals (BF7-BE0), one for each bank. These ‘separate memory banks allow the
Pentium to access any single byte, word, double word or quad word with one memory
transfer cycle. Please refer Fig. 2.7 on next page.Microprocessors and Microcontrollers __2-9 Bus Cycles & Memory & /O Org.
BE, BE, BES
Bank 7 Bank 6] |p] Bank 5
2
sramxa||lstamxa} || szmsa
t
y
Fig. 2.7 Pentium memory system organisation
In Pentium, the double-precision floating point number can be retrieved in one read
cycle because a double-precision floating point number is, 64-bit’Wide and data bus of
pentium is also 64 bit wide. t
s1zmx | |] 512m x 3|
P| P P
a a a
r r r
t t t
y y y
The pentium has an ability to check and generate parity for the address bus (As, - As)
during certain operations. The AP signal provides the system with parity information and
the APCHK indicates a bad parity check for the address bus. The Pentium takes no action
when an address parity error is detected. Therefore, in Pentium the.error must be assessed
by the system and the system must take appropriate action (an interrupt), if so desired.
2.9 1/O Organisation
Input/Output devices can be interfaced with Pentium systems in two ways.
1. 1/O mapped 1/0
2. Memory mapped 1/0
1/O Mapped 1/0
In I/O mapped I/O, the I/O devices are treated separate from memory. The Pentium
supports software and hardware features for separate memory and I/O address spaces.
Fig. 2.8 shows the memory and I/O spaces in real mode. Please refer Fig. 2.8 on next
Page.
The Pentium has four special instructions IN, INS, OUT, and OUTS to transfer data
through input/output ports in I/O mapped I/O system. M/IO signal is always low when
Pentium is executing these instructions. So M/TO signal is used to generate separate
addresses for Input/Output. Only 256 (2*) 1/O addresses can be generated when direct
addressing method is used. By using indirect addressing method this range can be
extended upto 65536 (2!*) addresses.Microprocessors and 2-40 Bus Cycles & Memory & 1/0 Org.
FFFFF 6
Memory
address
space FFFF
: VO address
space
000005
FFFF 4g
VO address
space
Page 0
Port 0 (32 - bit port)
Port 0 (16 - bit port)
Fig. 2.8 Memory and VO spaces in real modeMicroprocessors and Microcontrollers___2-14 Bus Cycles & Memory & 1/0 Org.
2.9.2 Memory Mapped 1/0
In memory mapped I/O, I/O device is placed in the memory address space of the
“microcomputer system. I/O device is connected as if it is a memory location. For this
reason, the method is known as memory mapped I/O.
In a microcomputer system with memory-mapped I/O, some of the memory address
spaces are dedicated to the I/O system. Fig. 29 shows memory mapped I/O devices in
the Pentium memory address space. Here, 4096 memory addresses from DOOOOH through
DOFFFH are assigned to I/O devices. The contents of DOQO0H represents byte wide I/O
port 0; contents of DOOOOH and DOOO1H represents word-wide port 0; and contents of
OOOOH through D0003 H represents double world wide port 0.
FFFFFy_
Port 4095
Port 0
(32 - bit por!
Porto por)
(16 - bit port)
UO addresses
Fig. 2.9 Memory mapped W/O devices
The I/O system for Pentium is identical to the 80386 microprocessor and it is
completely compatible with earilier Intel microprocessors. In Pentium, the I/O port
number appears on address lines A,; - A; with the bank enable signals used to select the
actual memory banks used for the I/O transfer.
Like 80386 microprocessor, the I/O privilege information is added to the TSS segment
when the Pentium is operated in the protected mode. This provides I/O protection and
allows 1/O ports to be selectively inhibited. If the inhibited 1/O location is accessed, the
Pentium generates a type 13 interrupt to indicate the I/O privilege violation.Microprocessors and Microcontrollers ___2-12 Bus Cycles & Memory & 1/0 Org.
2.10 Data Transfer Mechanism - 8-bit, 16-bit, 32-bit and
64-bit / .
Address Translation
The Pentium's address bus is designed to address 64-bit devices. It consists of Ay: Ay
and BE, : BE, signals. Signals BE, to BE, are used to select eight data bytes to get 64-bit
data bus. But in PC environment all devices are not 64-bit devices. In such cases, address
requirements depend on device sizes, as listed below :
32-bit devices : Aj;-Az and BE; : BEy
16-bit devices : AyyA, and BHE and BLE
B-bit devices : Ay :Ay
Pentium does not support these” address requirements. The extemal logic is required
for this address translation (Refer Fig. 2.7). The address translation is typically done in the
expansion bus control logic for smaller devices that are integrated onto the system board
or residing in expansion slots.
Data Bus Steering
The external logic must also ensure that information read from and written to 8-, 16-,
and 32-bit devices be transferred over the correct data path(s) (Refer Fig. 210).
Fig. 2.10 Address translation for 8, 16, and 32-bit devicesMicroprocessors and Microcontrollers 213 Bus Cycles & Memory &/O Org.
Since smaller devices such as 8-bit devices connect to the lower data paths only and
since the Pentium processor when reading from a device expects data from given locations
to be transferred over their respective data paths, data from a specified address location
must be directed or steered to the path over which the Pentium processor expects it.
Conversely, when the Pentium processor writes data to a device, it assumes that the device
is connected to all 8 data paths (that is, a 64-bit device). However, if the device is smaller
that 64-bits the data paths used by the Pentium processor may not connect to the smaller
devices, and again the data must be steered to the correct path. This is implemented with
a series of transceivers that can pass data from one path to another.
Data Bus Steering for 8-Bit Devices
Fig. 2.11 shows the data bus steering logic required for 8-bit devices. As shown in the
Fig. 2.11 an 8-bit device connects only to the lower data path (SD, : SD,).
Host/Processor Data Bus
Fig. 2.11 Data bus steering transceivers required by 8-bit devicesMicroprocessors and Microcontrollers 2-14 Bus Cycles & Memory & V/0 Org.
Let us consider the instruction, MOV EBX, [A004H] and assume that the memory
device is 8-bit. Since the destination register (EBX) is 32-bit, it is necessary to retrieve 4
bytes from the 8-bit memory device. To execute this instruction, the processor runs a single
memory read bus cycle to get the contents of the four locations starting at memory
location AOO4H. The processor has no idea that the memory device being accessed is an
8-bit device. To satisfy the processor's request the external logic has to activate multiple
bus cycles to access four bytes of data and inactivate BRDY signal to keep processor
waiting for the bus cycles to complete. Address translation logic uses byte enable signals
to generate Ay, A, and A, which are required to address the 8-bit device. The byte enable
signals also specify the data path over which the Pentium processor expects the data (In
our case, paths are 4, 5, 6 and 7).
Accessing First Byte
‘Address translation logic converts the quadword address output by the Pentium
processor to a byte address (A004H) required by the 8-bit device. When the 8-bit device is
ready to complete the first transfer, the bus control logic activates the steering logic (Path
0/4 Transceiver) to transfer the contents of data path 0 to data path 4. The data is then
latched into the latch on the data path 4 and the steering logic is disabled.
Accessing Second Byte
The bus control logic increments the address to select the next location (AQ0SH) from
the 8-bit memory device. Again 8-bit device delivers data on path 0. Now this data is
transferred through path 0/5 transceiver and latched on path 5 latch by the steering logic.
The steering logic is then disabled.
Accessing Third Byte
Again the bus logic increments the address to select the next location (AQQ6H). This
time the steering logic directs the data accessed from the 8-bit device to data path 6 by
activating the path 0/6 transceiver and then latching data on path 6 latch.
Accessing Fourth Byte
Same process is repeated. In this case the device address is A007H and the data is
latched on path 7 latch. When all four bytes of data are present on data path 4 through 7,
the bus control logic asserts the BRDY signal, telling the processor that the valid data is
present on the data buses. The processor then restarts its bus cycle and latches the contents
of data path 4 through 7 and ends the bus cycle.Microprocessors and Microcontrollers __ 2-15. Bus Cycles & Memory & /0 Org.
Data Bus Steering for 16-Bit Devices
Fig. 2.12 shows data steering logic required by 16-bit device. As shown in the Fig. 2.12,
3-transceivers are used to transfer data from lower byte (byte 0) of 16-bit device on ‘paths
2, 4 and 6 and another 3-transceivers are used to transfer data from upper byte (byte 1) of
16-bit device on paths 3, 5 and 7. The similar process (process described for data steering
of &-bit device) is repeated. If we consider previous example, (MOV, EBX, A004H) 16-bit
device requires only two accesses instead of four since each transfer results in two bytes of
transfer.
16-bit
Device
Fig. 2.12 Data bus steering transceivers required by 16-bit devicesMicroprocessors and Microcontrollers 2-16 Bus Cycles & Memory & I/O Org.
Accessing First and Second Bytes
Address translation logic converts the quadword address sent by. the Pentium
processor to a word address required by the 16-bit device. It also generates BHE and BLE
signals required to access 16-bit device. In our case, (MOV/EBX, A004H), conversion
results word address as AO04H with BHE and BLE signals asserted low. Due to this, 16-bit
device delivers 16-bit contents; Lower byte contents of address AQ04H over data path 0
and upper byte contents of address AQOSH over data path 1. At this time, bus control logic
activates the data bus steering logic (path 0/4 and path 1/5 transceivers) to transfer the
contents of data paths zero and one to data paths four and five respectively. (Refer
Fig. 2.12). The data is then latched into latches on data paths four and five and the steering
logic is disabled.
Accessing Third and Fourth Bytes
The bus control logic increments the address to select locations AQ06H and A007H
from the 16-bit device. With both BHE and BLE signals asserted, data from 16-bit device is
delivered over paths zero and one. The steering logic then activates transceivers 0/6 and
1/7 to transfer the contents of path zero and one. These contents are then latched into
corresponding latches. (Refer Fig. 2.12) When all four bytes of data are present on data
path 4 through 7, the bus control logic asserts the BRDY signal, telling the processor that
the valid data is present on the data buses. The processor then restarts its bus cycle and
latches the contents of data path-4 through 7 and ends the bus cycle.
Data Bus Steering for 32-bit Devices
Fig. 213 shows the data bus steering logic required when Pentium processor access
32-bit device. As shown in the Fig. 2.13 one transceiver is used to transfer data from each
byte to corresponding higher byte data path. We know that, the 32-bit device is capable of
accessing all jour bytes within a single bus cycle. However, in the example (MOV EBX,
A004H), the data from locations A004H through AQO7H is delivered over data paths zero
through three, while the Pentium expects the data to be over data paths four through
seven.
The Address translation logic converts quadword address sent by the Pentium into
doubleword address. It also generates BE, through BE; signals to access 32-bit device. In
our case (MOV EBX, A004H), conversion results double word address as AOO4H with BE,
through BE, signals asserted low. Due to this 32-bit device delivers 32-bit contents over
data paths 0 through 4. The bus control logic then activates bus steering logic to transfer
all four bytes to the upper paths. The contents of all four bytes are then latched on the
corresponding latches. When all four bytes of data are present on data path 4 through 7,
BRDY signal is asserted and processor completes its bus cycle after latching the contents
from data path 4 through 7.2-17
Microprocessors and Microcontrollers
Bus Cycles & Memory & /0 Org.
Pan
Trove
3 @
Data “5 $05,503
Path Rath
DSO
System Data Bus
32-bit
Device
Fig. 2.13 Data bus steering transceivers required by 32-bit devicesMicroprocessors and Microcontrollers 2-18 Bus Cycles & Memory & I/O Org.
Review Questions
PONAAAHSNH
10.
11
12,
13.
14.
15.
Give the contents of various registers of pentium processor immediately afier hardware reset.
Write a short note on bus operations of pentium processor.
Explain various bus cycle states of pentium processor.
Draw the state transition diagram for pentium processor.
What is non-pipelined ond pipelined bus cycle ?
Explain the non-pipelined read cycle with the help of timing diagram.
Explain the non-pipelined write cycle with the help of timing diagram
Explain the pipelined read cycle with the help of timing diagram.
Explain the pipelined urite cycle with the help of timing diagram
Explain burst cycle of pentium processor.
Write a short notes on -
a) Memory organisation of pentium processor
) 1/0 organisation of pentium processor.
What is data bus steering ?
Exploin the data bus steering for 8-bit devices.
Explain the dota bus steering for 16-bit devices.
Explain the data bus steering for 32-bit devices.
aaaPentium Programming
3.1 Introduction
In this chapter we are going to study the programming environment of Pentium
processor. It includes study of programmer's model, register set, addressing modes
datatype and instruction set supported by pentium processor.
3.2 Programmer's Model
The programming model makes it easier to understand the processor in a
programming environment. Pentium processor can be operated in three basic modes — real
mode, protected mode and virtual 8086 mode. We have introduced to real mode
programmer's model of pentium processor in chapterl. The protected mode programmer's
model of pentium processor includes some more registers. Let us study the protected
mode programmer's model of pentium processor with detail description of each register.
The Fig. 3.1 shows the Programm’s Model for pentium processor. In the figure, only
the shaded portion is a part of real mode. It consists of eight 16-bit registers (IP, CS, DS,
SS, ES, FS, GS and Flag register) and eight 32-bit registers (EAX, EBX, ECX, EDX, ESP,
EBP, ESI, EDI). In real mode, Pentium can access CRO, which is used to enter into the
protected mode. The Protection Enable bit (PE) is used to switch the Pentium from real to
protected mode, The registers in the programmer's model of pentium processor can be
categories according to their usage as given below.
1. General purpose registers
Segment registers
Index, pointers and base registers
Flag registers
System address registers
Control registers
Debug registers
PN a eR ED
Test Registers
@-1)Microprocessors and Microcontrollers Pentium Programmin
(CODE SEGNENT (cs)
eee evres
STACK SEGHENT (65)
oak BYTES
DATASEGMENT (F5)
“aK BYTES
DOATASEGNENT (6S)
FFFF ys
FFFFF 9
‘Mota: Shaded reisterhaicates real mode model cf Pentium processor
Fig. 3.1 Programmers model of Pentium processorMicroprocessors and Microcontrollers 3-3 Pentium Programming
3.2.1 General Purpose Registers
The Pentium contains 32-bit general purpose registers EAX, EBX, ECX, EDX, ESI, EDI,
EBP, and ESP to hold the following items:
= Operands for logical and arithmetic operations
= Operands for address calculations
= Memory pointers
Although all of these registers are available for general storage of operands, results,
and pointers, caution should be used when referencing the ESP register. The ESP register
holds the stack pointer and as a general rule should not be used for another purpose.
Many instructions assign specific registers to hold operands. For example, string
instructions use the contents of the ECX, ESI, and EDI registers as operands. When using a
segmented memory model, some instructions assume that pointers in certain registers are
relative to specific segments. For instance, some instructions assume that a pointer in the
EBX register points to a memory location in the DS segment.
Fig. 3.2 shows the general purpose registers in Pentium. The lower 16 bits of each of
the general purpose register can be accessed individually. These 16-bit registers are
accessed as AX, BX, CX, DX, SP, BP, SI, and, DI respectively. The AX, BX, CX and DX
registers can be further divided into two separate bytes : Higher byte and lower byte. For
example : AX « AH + AL. These bytes can be individually accessed as AH, AL, BH, BL,
CH, CL, DH, and DL.
Note : Register name beginning with an E (For example : EAX) indicate register width
is 32-bit. Register name ending with an X (for example AX) indicate a 16-bit
register and register name ending with H or L (for example AH or AL) indicate
it is an 8-bit register.
The other four general purpose registers, are the two pointer registers, ESP and EBP,
and the two index registers, ESI and EDI. These registers are used to do special functions.
They are used to store offset addresses of memory locations relative to the segment
registers. The index registers ESI and EDI are used to store offset values to be incremented
or decremented when stepping through block of data. The index registers are also used to
hold offset addresses for instructions that access data stored in the data segment part of
memory. Thus these registers can be combined with the values in the DS register using
index addressing. The pointer register ESP and EBP are used to store offset addresses of
mémory locations relative to the stack segment register.
The summary of special uses of general purpose registers is as follows :
EAX — Accumulator for operands and results data
EBX — Pointer to data in the DS segment
ECX — Countet for string and loop operations
EDX — I/O pointerMicroprocessors and Microcontrollers 3-4
Pentium Programming
ESI — Pointer to data in the segment pointed to by the DS register; source pointer for
string operations
EDI — Pointer to data (or destination) in the segment pointed to by the ES register;
destination pointer for string operations
ESP — Stack pointer (in the SS segment)
EBP — Pointer to data on the stack (in the SS segment)
Bit Bit
a 0
Bit Bit
15 0
Bit Bit Bit Bit
15 a7 0
I
EAX
: AX
AH AL,
EBX
B)
BH BL
ECX
ox
CH cL
EDX
Dx
OH DL
For E (Reg)
For (Reg) X
For (Reg) H
For (Reg) L
EAX (Accumulator)
AX
AHAL
EBX (Base)
Bx
BHBL
ECX (Count)
cx
CHCL
EDX (Data)
DX
DHOL
ESP (Stack Pointer)
sP
EBP (Base Pointer)
BP
ESI (Source Index)
si
EDI (Destination Index)
or
Fig, 3.2 General purpose registersMicroprocessors and Microcontrollers 35 Pentium Programming
3.2.2 Segment Registers
The segment registers (CS, DS, $5,
Bit 15 Bio ES, FS, and GS) hold 16-bit segment
selectors. A segment selector is a
cs Code segment special pointer that identifies a
Ds Datasegment segment in memory. To access a
ss Stack segment particular segment in memory, the
ES Extrasegment segment selector for that segment
FS Extrasegment must be present in the appropriate
GS Extrasegment — segment register. Fig. 3.3 shows the
segment registers.
Fig. 3.3 Segment registers
1° The CS (Code Segment) register holds the base address of the currently active code.
segment. .
2. The DS (Data Segment) is used to hold the address of currently active data
segment.
3, The ES (Extra Segment), FS, and GS are used as general data segment registers.
These registers hold the base addresses. of three different memory segments. These
segments are referred as to Extra Segments.
4, The base address of the currently active stack segment is contained in the SS (Stack
Segment) register.
3.2.3 Index, Pointers, and Base Registers
As mentioned earlier, the physical address of any memory location within a selected
memory segment is obtained by adding the segment address and the offset (The contents
of segment register are shifted left by 4 and the offset is added to the shifted contents of
segment register to generate physical address. The offset used to calculate physical address
is contained in any of the pointer, base, or index registers. The Table 3.1 shows the
segments and offset registers used with the corresponding segments.
Segment Register Offset Registers
CS (Code Segments) (€) IP (Instruction Pointer)
‘8S (Stack Segments) (€) SP (Stack Pointer)
(€) BP (Base Pointer)
DS (Data Segments) (€) BX (Base Register)
(E) SI Source Index Register
{E) DI Destination Index Register
ES, FS and GS (Extra Segments) (€) BX (Base Register)
(€) SI Source index Register
Table 3.1 Segments and offset registersMicroprocessors and Microcontrollers 3-6 Pentium Programmin,
3.2.4 EFLAGs Register
A Flag is a flip-flop which indicates some condition produced by the execution of an
instruction or controls certain operations of the EU. The 32-bit EFLAGS register contains a
group of status flags, a control flag, and a group of system flags. The Fig. 3.4 defines the
flags within this register.
FLA
oS
a
351312
3
Reserved
(0900000000) sc
ID Flag
Virtual Interupt Pending
Virtual Interrupt Flag
Alignment Check
Virtual 8086 Mode
Resume Flag
Nested Task
110 Privilege Levat
Overflow Flag
Direction Flog
Interrupt Enable Flag
Trap Fleg
Sign Flag
Zero Flag
Auxiliary Cary Flag
Parity Flag
Cary Flag
‘Note: all bits shown with a one or a zero are
Intel reserved. They must aways be set to
the values previously read from them,
Fig. 3.4 EFLAGs register
These flags can be categorized in three different groups :
1. Status flags: These flags reflect the state of a particular program
2. Control flags : These flags directly affect operation of few instructions.
3. System flags : These flags reflect the current status of the machine and which are
usually used by operating system than by application programs.
3.2.4.1 Status Flags
The status flags are : CF (Carry flag), PF (Parity flag ) AF (Auviliary carry flag), ZF
(Zero flag), SF (Sign flag), and OF (Overflow flag). These flags indicate some condition
produced by the execution of arithmetic or logical instructions. These flags provide
necessary information for arithmetic and logical control decisions.Microprocessors and Microcontrollers 37 Pentium Programming
CF (Bit 0) Carry flay
This bit is set by arithmetic instructions that generate either a carry or a borrow. This
bit can also be set, cleared, or inverted with the STC, CLC or CMC instructions,
respectively. Carry flag is also used in shift and rotate instructions to contain the bit
shifted or rotated out of the register.
PF (Bit 2) Parity flag :
The parity bit is set by most instructions if the least significant 8 bit of the result
contain even number of one’s.
AF (Bit 4) Auxiliary carry flag :
This bit is set when there is a carry or borrow after a nibble addition or subtraction,
respectively. The programmer can’t access this bit directly, but this bit is internally used
for BCD arithmetic.
ZF (Bit 6) Zero flag :
Zero flag is set to 1, if the result of an operation is zero.
SF (Bit 7) Sign flag :
The signed numbers are represented by combination of sign and magnitude. The most
significant bit (MSB) indicates sign of the number. For negative number MSB is 1. Sign flag
is set to 1, if the result of an operation is negative (MSB = 1).
OF (Bit 11) Overflow flag :
In 2's complemented arithmetic, most
BL significant bit is used to represent sign and
remaining bits are used to represent magnitude
of a number (see Fig. 3.5). This flag is set if the
result of a signed operation is too large to fit in
Fig. 3.5 Sign and magnitude representation the number of bits available (7-bits for 8-bit
number) to represent it.
For example, if you add the 8-bit signed number 01110110 (+118 decimal) and the 8-bit
signed number 00110110 (+54 decimal). The result will be 10101100 (+172 decimal), which
is correct binary result. But in this case, it is too large to fit in the 7 bits allowed for the
magnitude in an 8-bit signed number. The overflow flag will be set after this operation to
indicate that the result of the addition has overflowed into the sign bit.
3.2.4.2 Control Flags
DF (Bit 10) ( Direction flag) :
3s——~
Magnitude
The direction flag controls the direction of string operations. When the D flag is
cleared these operations process strings from low memory up towards high memory. This
means that offset pointers (usually Si and Dil) are incremented by 1 after each operation in
the string instructions when D flag is cleared. If the D flag is set, then SI and DI are
decremented by 1 after each operation to process strings from high to low memory.Microprocessors and Microcontrollers___3-8 Pentium Programming
3.2.4.3 System Flags
YM (Bit 17) Virtual Memory flag :
This flag indicates operating mode of pentium. When VM flag is set, pentium switches
from protected mode to virtual 8086 mode.
RF (Bit 16) (Resume) flag :
This flag, when set allows selective masking of some exceptions at the time of
debugging,
NT (Bit 14) (Nested flag) :
‘This flag is set when one system task invokes another task. (i.e. nested task).
IOPL (bits 12 and 13) I/O Privilege level :
The two bits in the IOPL are used by the processor and the operating system to
determine your application's access to I/O facilities. It holds privilege level, from 0 to 3, at
which the current code is running in order to execute any I/O related instruction.
IF (Bit 9) Interrupt Flag :
When interrupt flag is set, the pentium recognizes and handles external hardware
interrupts on its INTR pin. If the interrupt flag is cleared, pentium ignores any inputs on
this pin. The IF flag is set and cleared with the STI and CLI instructions, respectively.
TF (Bit 8) Trap Flag :
Trap flag allows user to single-step through programs. When an pentium detects that
this flag is set, it executes one instruction and then automatically generates an internal
exception 1. After servicing the exception, the processor executes the next instruction and
repeats the process. This single stepping continues until program code resets this flag. for
debugging programs single step facility is used.
AC (bit 18) Alignment Check Flag :
Alignment checking of memory references can be enabled by setting AC flag along
with the AM bit in the CRO register. Alignment checking of memory references is disabled
when either the AC flag and/or the AM bit is cleared.
ID (bit 21) Identification flag :
The ability of a program to set or clear this flag indicates support for the CPUID
instruction.
VIF (bit 19) Virtual interrupt flag :
Virtual image of the IF flag. Used in conjunction with the VIP flag. (To use this flag.
and the VIP flag the virtual mode extensions are enabled by setting the VME flag in
control register CR4.)Microprocessors and Microcontrollers __3-9 Pentium Programming
VIP (bit 20) Virtual interrupt pending flag :
Set to indicate that an interrupt is pending; clear when no interrupt is pending.
(Software sets and clears this flag; the processor only reads it.) Used in conjunction with
the VIF flag,
3.2.5 More about EFLAGs
Following the initialization of the processor (either by asserting the RESET pin or the
INIT pin), the state of the EFLAGS register is 00000002H. Bits 1, 3, 5, 15, and 22 through
31 of this register are reserved. Software should not use or depend on the states of any of
these bits.
Some of the flags in the EFLAGS register can be modified directly, using
special-purpose instructions (described in the following sections). There are no instructions
that allow the whole register to be examined or modified directly.
The following instructions can be used to move groups of flags to and from the
procedure stack or the EAX register: LAHF, SAHF, PUSHF, PUSHFD, POPF, and POPFD.
After the contents of the EFLAGS register have been transferred to the procedure stack or
EAX register, the flags can be examined and modified using the processor's bit
manipulation instructions (BT, BTS, BTR, and BTC).
When suspending a task (using the processor’s multitasking facilities), the processor
automatically saves the state of the EFLAGS register in the task state segment (TSS) for the
task being suspended. When binding itself to a new task, the processor loads the EFLAGS
register with data from the new task’s TSS.
When a call is made to an interrupt or exception handler procedure, the progessor
automatically saves the state of the EFLAGS registers on the procedtire stack. When an
interrupt or exception is handled with a task switch, the state of the EFLAGS register is
saved in the TSS for the task being suspended.
3.2.6 System Address Registers
There are four system address registers : TR (Task Register), IDTR (Interrupt
Descriptor Table Register), GDTR (Global Descriptor Table Register) and LDTR (Local
Descriptor Table Register). Fig. 3.6 shows these special registers which are used in
protected mode. These registers hold the addresses for the four special descriptor table
segments. The TR (Task Register) points to the Task state segment. The IDTR (Interrupt
Descriptor Table Register) points to the Interrupt Descriptor Table (IDT). The GDTR
(Global Descriptor Table Register) points to the Global Descriptor Table (GDT). The
LDTR (Local Descriptor Table Register) points to the local Descriptor Table (LDT).Microprocessors and Microcontrollers___3-10 Pentium Programming
47 15 0
GDTR
IDTR
15 °
LOTR
0
TR
Fig. 3.6 Protected mode registers
3.2.7 System Registers
To assist in initializing the processor and controlling system operations, the system
architecture provides system flags in the EFLAGS register and several system registers.
These include control registers, debug registers, test registers and model-specific registers.
The control registers (CRO, CR2,CR3, and CR4) contain a variety of flags and data
fields for controlling system-level operations. Other flags in these registers are used to
indicate support for specific processor “capabilities within the operating system or
executive.
The debug registers allow the setting of breakpoints for use in debugging programs
and systems software.
The task register contains the linear address and size of the TSS for the current task.
The model-specific registers (MSRs) are a group of registers available primarily to
operating system ot executive procedures (that is, code running at privilege level 0). These
registers control items such as the debug extensions, the performance-monitoring counters,
the machine check architecture, and the memory type ranges (MTRRs).
3.2.7.1 Control Registers
Control registers determine operating mode of the processor and the characteristics of
the currently executing task. These registers are 32bits in all 32-bit modes and
compatibility mode. There are five control registers : CRO, CR1, CR2, CR3 and CR4
Fig. 3.7 shows control registers. These registers define the machine state that affects all the
tasks in the systems.
Control Register 0 (CRO)
Control Register 0 contains system control flags that control operating mode and states
of the processor. It holds the MSW (Machine Status Word). It contains six status bits : PE
(Protection Enable), MP (Math Present), EM (Emulate Coprocessor), TS (Task Switched), ET
(Extension Type), NE (Numeric Error), WP (Write Protect) , AM (Alignment Mask), NW
(Not Write-through), CD (Cache Disable) and PG (Paging).Microprocessors and Microcontrollers Pentium Programming
31(69) 109876543210
T P| Plullp|,|r}ely)
Reserved (set to 0) Ic\c|clajs|eis|vim| CRs
| E|ElE|E/E|"jo} re
osxtnexcer ——
OSFXSR
31(63) 12H 5432 0
PP]
CR3
Page - Directory Base clwI
5 (PDBR)
31(63) 0
Page - Fault Linear Address cre
31(63) 0
cRt
313029 28 19 18171615 65432140
CIN A\ INIEIT}E]M|PI
eke iM) |p e[t|s|vfpje} CRO
Fig. 3.7 Control Registers
PE (Bit 0) Protection Enable :
This bit is similar to the VM bit in EFLAGs in that it controls the pentium’s mode of
operation. When PE is set, it is in protection mode otherwise it operates in Real
Mode.
MP (Bit 1) Math Present :
When this bit is set, the pentium assumes that real floating point hardware (80287 or
80387) is present in the system. When this bit is clear, the pentium assumes that no
such coprocessor exists, and will not attempt to use real floating point hardware.Microprocessors and Microcontrollers: __3-12 Pentium Programming
EM (Bit 2) Emulate Coprocessoi
When this bit is set, the pentium will generate an exception 11 (device not available)
whenever it attempts to execute a floating point instruction. Programmer can use
this exception handler to emulate floating point hardware in software.
TS (Bit 3) Task Switched :
The pentium sets the bit automatically every time it performs a task switch. It will
never clear this bit on its own, But programmer can clear this bit using CLTS
instruction.
ET (Bit 4) Extension Type :
When power is applied, pentium detects whether numeric processor connected is
80287 or 80387 and sets ET to logic 1, if numeric processor is 80387. This is
necessary because the 80387 uses a slightly different protocol than 80287.
NE (Bit 5) Numeric Error :
When set enables the internal mechanism for reporting x87 FPU errors when set;
enables the PC-style x87 FPU error reporting mechanism when clear. When the NE
flag is clear and the IGNNE input is asserted, x87 FPU errors are ignored. When the
NE flag is clear and the IGNNE input is deasserted, an unmasked x87 FPU error
causes the processor to assert the FERR pin to generate an external interrupt and to
stop instruction execution “immediately before executing the next waiting
floating-point instruction or WAIT/FWAIT instruction.
WP (Bit 16) Write Protect
When set inhibits supervisor-level procedures from writing into user-level read-only
pages and allows supervisor-level procedures to write into user-level read-only
pages when ‘clear.
AM (Bit 18) Alignment Mask :
When set enables automatic alignment checking when set and disables alignment
checking when clear. Alignment checking is performed only when the AM flag is
set, the AC flag in the EFLAGS register is set, CPL is 3, and the processor is
operating in either protected or virtual-8086 mode.Microprocessors and Microcontrollers _3-13 Pentium Programming
NW (Bit 29) Not Write-Through and CD (Bit 30 ) Cache Disable :
The Table 3.2 shows the interpretation of CD and NW bits within CRO.
cD NW
Description
1 1
Read hits access the cache,
Read misses do not cause line fils
Write hits update the cacho, but not external memory.
Write hits cause Exclusive (E) state linos to change to Modified (M) state.
Shared lines romain in the Shared (S) stato after writo hits
Write misses access memory.
Inquire and invalidation cycles do not effect the cache contents or state.
Read hils access the cache.
Read misses do nat cause line fils.
Write hits update the cache.
Writes ta S state Ines and write misses updata external memory
Writes to S state ines change to the E state when WB/WT
Inquire and invalidation cycles effect the cache contents and state.
legal combination: results in General Protection (GP) fault 0
Read hits access the cache.
Read misses cause line fils if CACHE and KEN are asserted.
Cache _ines are initially entered in the E or S state depending on the siate of
WBMWT (E = 1, S = 0).
Write hits update the cache.
Only writes to S siate lines and write misses access external memory.
Writes to S siate lines change to E state when WBIWT = 1
Inquire and invalidation cycles effect cache contents and state.
Table 3.2 Interpretation of the CD and NW bits within CRO
PG (Bit 31) Paging :
‘This bit enables or disables paging mechanism in Memory Management Unit
(MMU). If bit is set, paging is enabled
Control Register 1 (CR1)
This is reserved by Intel.Microprocessors and Microcontrollers 4 Pentium Programming
Control Register 2 (CR2)
Control Register 2 contains the page-fault linear address (the linear address that caused
a page fault). CR2 is read-only register. The pentium, itself writes the last 32-bit linear
address of page fault routine in this register. When page fault occurs, the pentium
generates exception 14 (page fault). This address is important for writing page fault
routine. The page fault routine helps programmer to find cause of the fault. .
Control Register 3 (CR3)
Control register 3 holds the physical address of the root of the two-level paging tables
used when paging is enabled. It is also called Page Directory Base Register (PDBR). Only
the most-significant bits (less the lower 12 bits) of the base address are specified; the lower
12 bits of the address are assumed to be 0. The page directory must thus be aligned to a
page (4-KByte) boundary. The PCD and PWT flags control caching of the page directory in
the processor’s internal data caches (they do not control TLB caching of page-directory
information).
PCD (Bit 4) Page-level Cache :
It controls caching of the current page directory. When the PCD flag is set, caching
of the page-directory is prevented; when the flag is clear, the page-directory can be
cached. This flag affects only the processor's internal caches (both L1 and L2, when
present). The processor ignores this flag if paging is not used (the PG flag in register
CRO is clear) or the CD (cache disable) flag in CRO is set.
PWT (Bit 3) Page-level Writes Transparent : .
It controls the write-through or writeback caching policy of the current page
directory. When the PWT flag is set, writethrough caching is enabled; when the flag
is clear, write-back caching is enabled. This flag affects only internal caches (both L1
and L2, when present). The processor ignores this flag if paging is not used (the PG
flag in register CRO is clear) or the CD (cache disable) flag in CRO is set
Control Register 4 (CR4)
Control Register 4 contains a group of flags that enable several architectural extensions,
and indicate operating system or executive support for specific processor capabilities. The
control registers can be read and loaded (or modified) usirig the
move-to-or-from-control-registers forms of the MOV instruction. In protected mode, the
MOV instructions allow the control registers to be read or loaded (at privilege level 0
only). This restriction means that application programs or operating-system procedures
(running at privilege levels 1, 2, or 3) are prevented from reading or loading the control
registers.roprocessors and Microcontrollers 315 Pentium Programming
‘VME (Bit 0) Virtual-8086 Mode Extensions :
When set it enables interrupt. and exceptionhandling extensions in virtual-8086
mode and disables the extensions when clear. Use of the virtual mode extensions
can improve the performance of virtual-8086 applications by eliminating the
overhead of calling the virtual-8086 monitor to handle interrupts and exceptions that
occur while executing an 8086 program and, instead, redirecting the interrupts and
exceptions back to the 8086 program's handlers. It also provides hardware support
for a virtual interrupt flag (VIF) to improve reliability of running 8086 programs in
multitasking and multiple-processor environments.
PVI (Bit 1) Protected-Mode Virtual Interrupts :
When set it enables hardware support for a virtual interrupt flag (VIF) in protected
mode and disables the VIF flag in protected mode when clear.
TSD (Bit 2) Time Stamp Disable :
When set it restricts the execution of the RDTSC instruction to procedures running
at privilege level 0 and allows RDTSC instruction to be executed at any privilege
level when clear.
DE (Bit 3) Debugging Extensions :
When set it references to debug registers DR4 and DRS cause an undefined opcode
exception to be generated and when clear, processor aliases references to registers
DR4 and DR5 for compatibility with software written to run on earlier processors
from Intel 32-bit family.
PSE (Bit 4) Page Size Extensions :
When set it enables 4-MByte pages and restricts pages to 4 KBytes when clear.
PAE (Bit 5) Physical Address Extension :
When set, enables paging mechanism to reference greater-or-equal-than-36-bit
physical addresses. When clear, restricts physical addresses to 32 bits.
MCE (Bit 6) Machine-Check Enable :
When set enables the machine-check exception and disables the machine-check
exception when clear.
PGE (Bit 7 ) Page Global Enable : (Introduced in the P6 family processors.)
When set enables the global page feature and disables the global page feature when
clear. The global page feature allows frequently used or shared pages to be marked
as global to all users (done with the global flag, bit 8, in a page-directory or
page-table entry). Global pages are not flushed from the translation-lookaside buffer
(TLB) on a task switch or a write to register CR3.Microprocessors and Microcontrollers 3-16 Pentium Programming
3.2.7.2 Debugs Registers
Debug registers allow pentium to provide debugging feature. The DR, to DR, registers
are used to control debug feature. The debug registers DRO to DR3 contain addresses
associated with one of four breakpoints defined by certain bits in debug register 7 (DR,)
Fig. 38 shows debug registers. The software debugger can load breakpoint addresses in
these registers to aid in debugging.
0 9128 27 28 2824 29232420 1848 97 a Ie hatsszt Oe 876543210
LEN tN LEN Lsiguict.toi
E|313| 212/41] 1/010}
4615141312111098 765432 10
31 0
31 0
Broakpoint 2 Linear Address:
1 0
Breakpoint 1 Linear Address
31 0
Breakpoint 0 Linear Address
(Gy Reserved
Fig. 3.8 Debug registers
DRT
DRE
| ors
DR4
DRS
DR2
DRI
DROicroprocessors and Microcontrollers 7 Pentium Programming
These registers can be written to and read using the move to or from debug register
form of the MOV instruction. A debug register may be the source or destination operand
for one of these instructions. The debug registers are privileged resources; a MOV
instruction that accesses these registers can only be executed in real-address mode, in
SMM, or in protected mode at a CPL of 0. An attempt to read or write the debug registers
from any other privilege level generates a general protection exception.
Debug Registers 0 through 3
The first four debug registers (DR) - DR;) hold four linear addresses for breakpoints.
The addresses in these registers are compared with address of the each instruction at the
time of instruction execution and if a match is found, an exception 1 (debug fault) is
generated. This allows pentium to monitor upto four different addresses in the-system. For
each breakpoint, the following information can be specified and detected with the debug
registers: .
= The linear address where the breakpoint is to occur.
= The length of the breakpoint location (1, 2, or 4 bytes).
= The operation that must be performed at the address for a debug exception to be
generated.
= Whether the breakpoint is enabled.
= Whether the breakpoint condition was present when the debug” exception was
generated.
Debug Registers 4 and 5
Registers 4 and 5 are undefined.
Debug Register 6
Debug register 6 is also called debug status register. This register is updated only
when an exception is generated. The pentium sets the appropriate bits in this register
which gives information of the probable causes for the last debug fault (Exception 1). The
pentium never clears these bits. Programmer must clear these status bits by writing into
DR6. The status bits are :
BO - B3 Breakpoint Condition Detected :
When set, bit indicates that its associated breakpoint condition was met when a
debug exception was generated. These flags are set if the condition described for
each breakpoint by the LENn, and R/Wn flags in debug control register DR7 is true.
They are set even if the breakpoint is not enabled by the Ln and Gn flags in register
DR7.BD (Bit 13) Break For Debug Register Access :
The access for the debug registers can be locked by setting GD bit in DR; The BD
bit, if set, allows to invoke exception 1 handler, if processor tries to access debug
register eventhough the accessed is locked.
BS (Bit 14) Break For Single Step :
This bit is set if the pentium has invoked exception 1 since trace bit is set (TF bit is
set in EFLAGs)
BT (Bit 15) Break for task switch :
When set this flag indicates that the debug exception resulted from a task switch
where the T flag (debug trap flag) in the TSS of the target task was set.
Debug Register 7
It controls the debug feature. By programming bits in this register, programmer can
configure the debug operation of the four linear address breakpoints. Each breakpoint is
controlled by a set of four fields. These are :
LO - L3 (Bit 0, 2, 4, and 6) Local Enable :
When this bit is set, the breakpoint address in DRo is monitored as long as pentium
is executing current task. When a task switch occurs, this bit is cleared by the
pentium and it must be re-enabled by writing into DRy required.
G0 - G3 (Bit 1, 3, 5, and 7) Global Enable :
When this bit is set, the breakpoint address in DRo is monitored all times, regardless
of task. This bit must be cleared by writing into DR.
RWO - RWS (Bit 16, 17, 20, 21, 24, 25, 28, and 29) Read/Write Access :
These bits decides the type of access that must occur at the address in DRp. Table 33
gives the list of different access types.
RW RW bits in register DR;
00 Code fetch
o1 Data write
10 Reserved
"1 Data Read or write
Table 3:3 RW bitsMicroprocessors and Microcontrollers 319 ‘ium Programming
LENO - LEN3 Length Fields (Bits 18, 19, 22, 23, 26, 27, 30, and 31) :
The breakpoints are further distinguished by its size. The Table 3.4 shows the
different sizes of the breakpoints.
LEN LEN bits in register DR,
00 1 byte
04 2 bytes, word aligned
10 Reserved
4 4 bytes, dword aligned
Table 3.4 LEN bits
LE (Local Exact) :
The pipelined architecture of pentium fetches, decodes next instruction before the
current one completes. Due to this, pentium may not set status bit in DR, at the
instant breakpoint occurs. If you set local exact bit, pentium sets, corresponding
status bit at the same instant at which breakpoint occurs, when pentium is running
the current task. When a task switch occurs this bit is cleared. This bit applies to all
four linear breakpoints.
GE (Global Exact) :
This is similar to the LE bit. If this bit is set pentium informs about breakpoint at
the instant it occurs regardless of task.
GD (Global debug access) :
When this bit is set, the pentium denies the further access to any of the debug
registers, either for reading or writing.
3.2.7.3 Test Registers |
Among the eight test registers (TRy-TR,), only two test registers (TR,-TR,) are currently
defined. The Fig. 3.9 shows the bit pattern of test registers. These registers are used to
check translation lookaside buffer (TLB) of the paging unit.
Linear address
Physical address
Fig. 3.9 Test Registers
Test Register 6
This is the TLB testing command registers. By writing into this register, it is
possible to either initiate a write directly into the pentium’s TLB or to perform TLB
lookups. TR, is divided into fields as follows :Microprocessors and Microcontrollers __3-20 Pentium Programming
c : This is.a commend bit. When this bit is cleared, a write to the
TLB is performed. If it is set, the processor performs a TLB
lookup.
The next 7 bits are used as tag attributes for the TLB cache,
either when writing a new entry or when performing a TLB lookup.
W (bit 5) Not writable
W (bit 6) Writable
U (bit 7) : ‘Not user
U (bit 8) : User
D bit 9) : ‘Not dirty
D (bit 10) : Dirty
V (bit 11) : Valid
cae ea
Test Register 7
This register is the data testing register of the TLB. When a program is performing
writes, the entry to be stored is contained in this register, along with cache set information.
TR, is divided into fields as follows
RP : This is replacement pointer. This field indicates which set of the
TLB's four-way set associative cache to write to
H : ‘This is peinter location. If this bit Is set, the RP field determines
which cache set to write to. If it is cleared, the set is
determined with an internal algorithm.
Physical address, This is the data field of the TLB. This field contains either the
(bits 12-31) . physical address to be written into the TLB or the result of a
valid TLB hit
3.3 Pentium Addressing Modes
When processor executes an instruction, it performs the specified function on data,
which is referred to as operands. The operand may be the part of instruction, may reside
in one of the internal registers of the processor, may be stored in memory, or may be held
at an I/O port. As a part of programming flexibility, processor provides different ways to
access these operands from different locations. The different ways by which processor can
access data are referred to addressing modes.
The Pentium provides a total of 11 addressing modes for instuctions to specify
operands. These addressing modes can be categorized in three groups :
«Register Operand addressing
= Immediate Operand addressing
= Memory Operand addressingMicroprocessors and Microcontrollers __3-21 Pentium Programming
The memory operand addressing modes are further classified as shown in Fig. 3.10.
Pentium addressing modes:
ee
Register immediate Memory operand
operand operand addressing
addressing addressing |
Direct Register Based Index Scaled Based Based Based —_Based scaled
indirect index index - scaled —_ index index
index with with
displacement displacement
Fig. 3.10 Pentium addressing modes
Register Operand Addressing Mode
In the register addressing mode, the operand is located in one of the 8, 16 or 32-bit
general purpose registers of Pentium. Table 3.5 shows the ist of internal general purpose
registers that can be used as a source or destination operand
Register Operand size
Bytes (Reg 8) Word (Reg 18) | Double word (Reg 32 )
‘Accumulator AL. AH ax
Base aL. BH 8x
Count cL.cH ox
Data DL. DH ox
Stack pointer - sP
Base pointer - BP
Source index - si
Destination index - o
Code segment - cs
Data segment - bs
‘Stack segment - ss
E data segment - ES
F data segment - Fs
G data segment - os
Table 3.5 Direct addressing registers and their sizesMicroprocessors and Microcontrol 3-22 Pentium Pi ming
Examples :
For 8-bit operand : MOV AL, DL
This instruction copies the lower byte contents of the EDX register to the lower byte of
the EAX register. Both source and destination operands are the internal registers of
Pentium.
Before Execution After Execution
For 16-bit operand : MOV AX, DX
This instruction copies the lower word contents of EDX register to the lower word of
the EAX register.
Before Exécution After Execution
0 M1 oO
For 32-bit operand : MOV EAX, EDX
This instruction copies the contents of EDX register to the EAX register.
Before Execution ‘After Execution
ex [zsea[ ee]
Immediate Operand Addressing Mode
In the immediate operand addressing mode, the operand is a part of the instruction, as
shown in the Fig. 3.11. The operand can be 8-bit, 16-bit or 32-bit.Microprocessors and Microcontrollers ___3-23 Pentium Programming
Opcode | Immeciate operand
ns
Instruction
Fig. 3.11 Instruction encoded with an immediate operand
Example :
For 8-bit operand : MOV AL, 20H
This instruction copies 20H in the lower byte of EAX register.
Before Execution After Execution
Ki 0 3 0
ox [or [ex]
For 16-bit operand : MOV AX, 1020 H
This instruction copies 1020H in the lower word of EAX register
Before Execution After Execution
31 oO 31 oO
For 32-bit operand : MOV EAX, 10B89C20H
This instruction copies 10B89C20H in the EAX register.
Before Execution After Execution
3 oO 3 0
Memory Operand Addressing Modes
The remaining 9 addressing modes provide a mechanism for specifying the physical
address of an operand. In Pentium, physical address is calculated before any read or write
operation.
The physical address consisis of two components : The segment base address and an
affective address. The effective address can be specified in a variety of ways. One way is to
encode the effective address of the operand directly in the instruction. This represents
direct addressing mode. The effective address can be generated with the combinations of four
addressing elements : Base, Index, Scale factor and displacement.Microprocessors and Microcontrollers 3-24 Pentium Programming
where
Base : The contents of any general purpose register.
Index : The contents of any general purpose register. The index registers are used to
access the elements of an array, or a string of characters.
Scale : The index register’s value can be multiplied by a scale factor either 1, 2, 4, or 8.
Scaled index mode is especially useful for accessing arrays or structures.
Displacement : An 8,.16 or 32-bit immediate value following the instruction.
The general formula for generating effective address is given as follows :
EA = base + (index x scaling factor) + displacement
The Fig. 3.12 shows the registers that can be used to hold the values of segment base,
base, and index.
cs AK AX 1
ss Bx BX
bs ox cx 2 8,18 0"
PAT\ eg 7 ' { sp? *|{ DX DX *4 Displace-
BP BP 4 ment
FS SI si
cs Dl DI 8
Fig. 3.12 Physical address generation
Physical Address Segment Base Address + Effectice Address
(PA) SBA + EA
PA = SBA: {Base + (Index x Scale factor) + Displacement }
Now we see the different memory operand addressing modes :
Direct Mode : In this mode, the instruction is having the effective address of the
operand. This effective address is used as an 8, 16 or 32 displacement from the location
specified by the current value in the selected segement register is always DS.
Example : MOV EBX, 159D H
Here, PA=DS + 159D H
Register Indirect Mode : In this mode, the base register gives the effective address
of the operand.
Example : MOV EBX, [EAX]
Here, PA=DS + EAX
Based Mode : In this mode, a base register’s contents are added to a displacement to
form the effective address of the operand.Microprocessors and Microcontrollers __3-25 Pentium Programming
Example : MOV EBX, [ EAX + 24]
HerePA=DS + EAX + 24
Index Mode : In this mode, an index register’s contents are added to a displcement to
form the effective address of the operand.
Example : MOV EBX, 159D; + [ SI]
Here, PA=DS + 159Dy + SI
Scaled Index Mode : In this mode, an index register’s contents are multiplied by a
scaling factor and then added to displacement to form the effective address of the operand.
Example : MOV EBX, 159Dy + [ SI * 4]
Here, PA=DS + 159Dy + (SI * 4)
Based Index Mode : In this mode, the contents of a base register are added to the
contents of an index register to form the effective address of the operand.
Example : MOV EBX, [ ESI ][ EAX ]
Here, PA=DS + ESI + EAX
Based Scaled Index Mode : In this mode, the contents of an index register are
multiplied by a scaling factor and then added to the base register to obtain the effective
address of the operand. -
Example : MOV EBX, [ ESI * 2 ] [ EAX]
Here, PA=DS + ( ESI x 4) + EAX
Based Index Mode with Displacement : In this mode, the contents of an index
register and the base register and a displacement are all added together to form the
effective address of the operand.
Example : MOV EBX, [ EAX ] [ EDI + 24]
HerePA=DS + EAX + EDI + 24
Based Scaled Index Mode With Displacement : In this mode, the contents of an
index register are multiplied by a scaling factor and result is then added to the contents of
a base register and displacement to form the effective address of the operand.
Example : MOV EBX, [ EAX] [ ESI” 4] + 24
Here, PA=DS + EAX + ( ESI x 4) + 24Microprocessors and Microcontrollers
Pentium Programming
3.4 Pentium Data Types
The Pentium can handle with data types of 8 (byte), 16 (word), 32 (doubleword), and
64 (quadword) bits in length. The table 3.6 lists the data types supported by Pentium
processor.
15. )
31 0
63 o
15, °
31 0
63 0
31_ 30 22 o
Ss Ep Mantissa
63 62 51 0
Ss Exp Mantissa
7978 6462 0
S_ Exponent Mantissa
Fig. 3.13 Pentium numeric data formats
Byte unsigned integer
Word unsigned integer
Double word unsigned integer
Quad word unsigned integer
Byte signed integer
(2's complement form)
Word signed integer
(2s complement form)
Double word signed integer
(2s complement form)
Quad word signed integer
(2s complement form)
Single precision
floating point
Double precision
floating point
Double extended
precision floating pointMi rocessors and Microcontrollers 3-27 Pentium Program
| Data Type Description
General Bit (byte), 16-bit (word), 32-bit (double word), and}
64-bit (quadword) locations.
Integer A signed binary value represented in 2's complement
form. It can be byte, word or doubleword in length.
Ordinal ‘An unsigned integer. It can be byte, word, or doublo|
word in length.
Unpacked BCD (Binary Coded Decimal) One BCD
Packed BCD ‘Two BCD digits in one byte.
Near Pointer ‘A 32-bit effective address that represents the offset|
within a segment. Used for references within a.
(0 - 9) in one byte.
segmented memory.
Bit field ‘Any bit position in the sequence of bits.
Byte string A contiguous sequence of bytes.
Floating point IEEE standard formats.
Table 3.6 Illustrates data types
3.5 Instruction Set Summary
3.5.1 Date Transfer Instructions
The data transfer instructions move data between memory and the general-purpose
and segment registers. They also perform specific operations such as conditional moves,
‘stack access, and data conversion.
MOV: Move data between general-purpose registers; move data between memory and
general-purpose or segment registers; move immediates to general-purpose registers
XCHG : Exchange
BSWAP : Byte swap
XADD : Exchange and add
CMPXCHG : Compare and exchange
CMPXCHGSB : Compare and exchange 8 bytes
PUSH : Push onto stack
POP: Pop off of stack
PUSHAJ/PUSHAD : Push general-purpose registers onto stack
POPA/POPAD : Pop general-purpose registers from stack
CWDICDQE : Convert word to doubleword/Convert doubleword to quadword
CBWICWDE : Convert byte to word/Convert word to doubleword in EAX register
MOVSX: Move and sign extend
MOVZX : Move and zero extendMicroprocessors and Microcontrollers 28 Pentium Programming
3.5.2 Binary Arithmetic Instructions
The binary arithmetic instructions perform basic binary integer computations on byte,
word, and doubleword integers located in memory and/or the general purpose registers.
ADD : Integer add
ADC : Add with carry
SUB : Subtract
SBB : Subtract with borrow
IMUL : Signed multiply
MUL : Unsigned multiply
IDIV : Signed divide
DIV : Unsigned divide
INC : Increment
DEC : Decrement
NEG : Negate
MP : Compare
3.5.3 Decimal Arithmetic Instructions
The decimal arithmetic instructions perform decimal arithmetic ‘on binary coded
decimal (BCD) data.
DAA : Decimal adjust after addition
DAS : Decimal adjust after subtraction
AAA: ASCII adjust after addition
AAS : ASCII adjust after subtraction
AAM : ASCII adjust after multiplication
AAD : ASCII adjust before division
3.5.4 Logical Instructions
The logical instructions perform basic AND, OR, XOR, and NOT logical operations on
byte, word, and doubleword values.
AND : Perform bitwise logical AND
OR : Perform bitwise logical OR
XOR : Perform bitwise logical exclusive OR
NOT : Perform bitwise logical NOTMicroprocesso! id Microcontrollers 3-29 Pentium Programming
3.5.5 Shift and Rotate Instructions
The shift and rotate instructions shift and rotate the bits in word and doubleword
operands.
SAR : Shift arithmetic right
SHR : Shift logical right
SALISHL : Shift arithmetic left/Shift logical left
SHRD : Shift right double
SHLD : Shift left double
ROR: Rotate right
ROL: Rotate left
RCR : Rotate through carry right
RCL: Rotate through carry left
3.5.6 Bit and Byte Instructions
Bit instructions test and modify individual bits in word and doubleword operands.
Byte instructions set the value of a byte operand to indicate the status of flags in the
EFLAGS register.
BT: Bit test
BTS : Bit test and set
BTR : Bit test and reset
BTC: Bit test and complement
BSF : Bit scan forward
BSR : Bit scan reverse
SETE/SETZ : Set byte if equal/Set byte if zero
SETNE/SETNZ : Set byte if not equal /Set byte if not zero
SETA/SEINBE : Set byte if above/Set byte if not below or equal
SETAE/SETNB/SETNC : Set byte if above or equal /Set byte if not below/Set byte if not
carry
SETB/SETNAE/SETC : Set byte if below/Sct byte if not above or equal/Set byte if carry
SETBE/SETNA : Set byte if below or equal/Set byte if not above
SETG/SETNLE : Set byte if greater /Set byte if not less or equal
SETGE/SETNI
SETLISETNGE : Set byte if less/Set byte if not greater or equal
SETLE/SEING : Set byte if less or equal/Set byte if not greater
SETS : Set byte if sign (negative)
SETNS : Set byte if not sign (non-negative)
SETO : Set byte if overflow
Set byte if greater or equal /Set byte if not lessMicroprocessors and Microcontrollers __3-30 Pentium Programming
SETNO : Set byte if not overflow
SETPE/SETP : Set byte if parity even/Set byte if parity
SETPO/SETNP : Set byte if parity odd/Set byte if not parity
TEST : Logical compare
3.5.7 Control Transfer Instructions
The control transfer instructions provide jump, conditional jump, loop, and call and
return operations to control program flow.
JMP : Jump
JEJZ : Jump if equal/Jump if zero
JNE/JNZ : Jump if not equal/Jump if not zero
JAJNBE : Jump if above/Jump if not below or equal
JAE/JNB : Jump if above or equal/Jump if not below
JBIJNAE : Jump if below/Jump if not above or equal
JBE/JNA : Jump if below or equal/Jump if not above
JGNLE : Jump i greater/Jump if not less or equal
JGE/JNL : Jump if greater or equal/Jump if not less
JUJNGE : Jump if less/Jump if not greater or equal
JLE/JNG : Jump if less or equal/Jump if not greater
JC: Jump if carry
JNC : Jump if not carry
JO: Jump if overflow
JNO : Jump if not overflow
JS: Jump if sign (negative)
JNS : Jump if not sign (non-negative)
JPO/JNP : Jump if parity odd/Jump if not parity
JPE/P : Jump if parity even/Jump if parity
JCXZ/JECXZ : Jump register CX zero/Jump register ECX zero
LOOP: Loop with ECX counter
LOOPZ/LOOPE : Loop with ECX.and zero/Loop with ECX and equal
LOOPNZ/LOOPNE : Loop with ECX and not zero/Loop with ECX and not equal
CALL : Call procedure
RET : Return
IRET : Return from interrupt
INT : Software interrupt
INTO : Interrupt on overflow
BOUND : Detect value out of rangeMicroprocessors and Microcontrollers 3-31 Pentium Programming
3.5.8 String Instructions
The string instructions operate on strings of bytes, allowing them to be moved to and
from memory.
MOVS/MOVSB : Move string/Move byte string
MOVS/MOVSW : Move string/Move word string
MOVS/MOVSD : Move string/Move doubleword string
CMPS/CMPSB : Compare string/Compare byte string
CMPS/CMPSW : Compare string/Compare word string
CMPS/CMPSD : Compare string/Compare doubleword string
SCAS/SCASB : Scan string/Scan byte string
SCAS/SCASW : Scan string/Scan word string
SCAS/SCASD : Scan string/Scan doubleword string
LODS/LODSS : Load string/Load byte string
LODS/LODSW : Load string/Load word string
LODS/LODSD : Load string/Load doubleword string
STOS/STOSB : Store string/Store byte string
STOS/STOSW : Store string/Store word string
STOS/STOSD : Store string/Store doubleword string
REP : Repeat while ECX not zero
REPE/REPZ : Repeat while equal/Repeat while zero
REPNE/REPNZ : Repeat while not equal/Repeat while not zero
3.5.9 1/0 Instructions
These instructions move data between the processor’s I/O ports and a register or
memory.
IN: Read from a port
OUT : Write to a port
INS/INSB : Input string from port/Input byte string from port
INS/INSW : Input string from port/Input word string from port
INS/INSD : Input string from port/Input doubleword string from port
OUTS/OUTSB : Output string to port/Output byte string to port
OUTS/OUTSW : Output string to port/Output word string to port
OUTS/OUTSD : Output string to port/Output doubleword string to port
3.5.10 Enter and Leave Instructions
These’ instructions provide machine-language support for procedure calls in
block-structured languages.Microprocessors and Microcontrollers___3-32 Pentium Programming
ENTER : High-level procedure entry
LEAVE : High-level procedure exit
3.5.11 Flag Control (EFLAG) Instructions
The flag control instructions operate on the flags in the EFLAGS register.
STC : Set carry flag
CLC: Clear the carry flag
CMC: Complement the carry flag
CLD : Clear the direction flag
STD : Set direction flag
LAHF : Load flags into AH register
SAHF : Store AH register into flags
PUSHF/PUSHFD : Push EFLAGS onto stack
POPF/POFFD : Pop EFLAGS from stack
STI: Set interrupt flag
CLI: Clear the interrupt flag
3.5.12 Segment Register Instructions
The segment register instructions allow far pointers (segment addresses) to be loaded
into the segment registers.
LDS : Load far pointer using DS
LES : Load far pointer using ES
LFS : Load far pointer using FS
LGS : Load far pointer using GS
LSS : Load far pointer using SS
3.5.13 Miscellaneous Instructions
The miscellaneous instructions provide such functions as loading an effective address,
executing a “no-operation,” and retrieving processor identification information.
LEA : Load effective address
NOP : No operation
XLAT/XLATB : Table lookup translation
CPUID : Processor identificationProtected Mode
1 Introduction
In this chapter we will see the protected mode features of Pentium processor. The
complete capabilities of the Pentium processor are unlocked when the Pentium processor
operates in Protected Mode. After reset Pentium processor enters into real mode but
setting bit 0 in CRO register it is possible to operate Pentium processor in Protected Mode.
Features of Protected Mode :
1. Protected Mode vastly increases the linear address space to four gigabytes
(28 bytes) and allows the running of virtual memory programs of almost
unlimited size (64 terabytes or 2 bytes ).
2. Protected Mode allows the Pentium processor to run all of the existing 8086
and 80286 programs.
3. It provides a sophisticated memory management and a hardware-assisted
protection mechanism.
4, It provides special Pentium processor instructions for multitasking operating
systems.
5. It supports paging mechanism.
4.2 Protected Mode-Support Registers
Fig. 4.1 shows the protected mode register set of the Pentium processor. It is a
superset of the real mode register set. It has addition registers. These are :
1, Global Descriptor Table Register (GDTR) - 48 bits : It holds the 32-bit linear base
address and 16-bit limit of the Global descriptor Table (GDT).
2. Interrupt Descriptor Table Register (IDTR) - 48 bits : It holds the 32-bit linear base
address and 16-bit limit of the Interrupt Descriptor Table (IDT).
3. Local Descriptor Table Register (LDTR)- 16 bits : It holds the 16-bit selector for the
Local Descriptor Table Descriptor.
4. Task Register (TR)- 16 bits : It holds the 16 bit selector for the Task State Segment
Descriptor.
(4-1)Microprocessors and Microcontrollers 42 Protected Mode
Pentien
x praetor 0
er
os
ss
es
os
x» 1587 0
ex mf a | ex
cx cpa] ex
ox af a | ox
esp ra
cap ee
Es, s!
» °
erscs
corr [acimw Sasa] la
ore [owe Bendheovn] “Lei
ore [seo
Ry sw
cr,
oR
ory
* °
Re
br,
DR,
DR,
caf
Rs
OR
oR,
Te
1
Fig. 4.1 Protected mode register model of Pentium processorMicroprocessors and Microcontrollers 4-3 Protected Mode
In protected mode register set, function of few registers have been extended.
1. The instruction pointer is now 32 bit. It is called as EIP.
2. More bits of the flag registers (EFLAGs) are active.
3. All five control registers CR,-CR, are active.
The following sections describe the functions of these registers in detail.
4.3 Logical to Physical Address Translation
The Pentium processor has three distinct address spaces : Logical, linear and physical.
A logical address (also known as virtual address) consists of a selector and an offset. A
selector is the contents of a segment register
15 0 34 9
Descriptor table
Linear base address
Descriptor
‘Segment translation
's
paging
enabled,
Yes
Page
translation
34 °
Physical address,
Fig. 4.2 Address translation overviewMicroprocessors and Microcontrollers 44 Protected Mode
We know that, in real mode, the segmentation unit shifts the selector left four bits and
adds the result to the offset to form the linear address. While in protected mode every
segment selector has a linear base address associated with it, and it is stored in the
segment descriptor. A selector is used to point a descriptor for the segment in a table of
descriptors. The linear base address form the descriptor is then added to the 32-bit offset
to generate the 32-bit linear address. This process is known as segmentation or segment
translation. If paging unit is not enabled then the 32-bit linear address corresponds to the
physical address. But if paging unit is enabled, paging mechanism translates the linear
address space into the physical address space by paging translation. This is illustrated in
Fig. 4.2. The following sections describe the segment translation and page translation
mechanism in detail.
4.4 Segmentation
Segmentation or segment translation is a process of converting logical address into a
linear address. Fig. 4.3 shows the segment translation mechanism. It shows how selector is
used to access a descriptor in a descriptor table. The 13-bit index part of selector is
multiplied by 8 and used as a pointer to the desired descriptor in a descriptor table. The
index value is multiplied by 8 because each descriptor requires 8 bytes in the descriptor
Logicat
address
18 0 31 0
Fig. 4.3. Segment translation mechanismMicroprocessors and Microcontrollers 45 Protected Mode
table. The descriptor in the descriptor table contains mainly base address, segment limit
and access right byte. The Pentium processor adds the base address from the descriptor to
the effective address or offset to generate a linear address.
As shown in the Fig. 4.3, the selector component of each logical address contains 2 bits
which represent the privilege level of the program section requesting access to a segment.
The descriptor of each segment contains 2 bits which represent the privilege level of that
segment. When an executing program attempts to access a segment, the memory
management unit compares the privilege level in the selector with the privilege level in the
descriptor. If the segment selector has the same or greater privilege level, then the memory
management unit allows the segment to be accessed. If the selector privilege level is lower
than the privilege level of the segment, the memory management unit denies the access
and sends an interrupt signal to the CPU indicating a privilege level violation.
There are two major categories of descriptor table in a Pentium processor system :
Global and Local. The Global Descriptor Table (GDT) is a general purpose table of
descriptors, can be used by all programs to reference segments of memory. Whereas a
Local Descriptor Table (LDT) are set up in the system for individual task or closely related
group of tasks. The table indicator (TI) bit in the selector decides which descriptor table
should be referred by the selector. When TI bit is 0, the index portion of the selector refers
to a descriptor in the GDT. When TI bit is 1, it refers to descriptor in the current LDT.
This is illustrated in Fig. 4.4 *
Local
Descriptor
Table
Fig. 4.4 Selector and descriptor tablesMicroprocessors and Microcontrollers 4-6 Protected Mode
Fig. 44 shows that the first entry in the GDT is reserved by the processor and should
be all zeros. This is know as the NULL descriptor. The processor does not cause an
exception when a segment register (other than CS or $S) is loaded with a null selector.
However, it will cause an exception when the segment register is used to access memory.
This feature is useful for initialising unused segment registers so as to trap accidental
references.
Pentium processor has six segment registers
= One for current code segment (CS)
= One for current stack segment (SS)
= Four for general data segments (DS, ES, FS, GS)
Segment registers (selectors) select segment descriptors :
= Thirteen bits select descriptor
= One bit selects descriptor table
= Two bits aid privilege checking
4.5 Segment Descriptors and Memory Management through
Segmentation
In protected mode, memory management unit (MMU) ‘uses the segment selector to
access a descriptor for the desired segment in a table of descriptors in memory. Segment
descriptor is a special structure which describes the segment. Exactly one segment
descriptor must be defined for each segment of the memory.
Descriptors are eight type quantities which contain attributes about a given region of
linear address space (i.e. a segment). These attributes include the 32-bit base linear address
of the segment, the 20-bit length and granularity of the segment, the protection level, read,
write or execute privileges, the default size of the operands (16-bit or 32-bit), and the type
of segment. Fig. 4.5 shows the general format of a descriptor. As shown in Fig. 45,
segment descriptor has following fields.
Base : It contains the 32-bit base address for a segment. Thus defines the location of
the segment within the 4 gigabyte linear address space. The Pentium processor
concatenates the three fragments of the base address to form a single 32-bit address.
Limit ; It defines the size of the segment. The Pentium processor concatenates the two
fragments of the limit field to form a 20 bit value. The Pentium processor interprets this
20-bit value in two ways, depending on the setting of the granularity bit (G) :
If G bit 0 : In units of one byte, to define a limit of up to 1 M byte (2%)
If G bit 1: In units of 4 kilobytes, to define a limit of up to 4 gigabytes.
Granularity Bit : It specifies the units with which the limit field is interpreted. When
bit is 0, the limit is interpreted in units of one byte; otherwise limit is interpreted in units
of 4 Kbytes.sroprocessors and Microcontrollers 47 Protected Mode
1 0 Bytes
SEGMENT BASE 15 SEGMENT LIMIT 15, 4
G]D} 0} AVL] LIMIT +4
19... 16
8
Access Rights Bytes
BASE Base Address of the segment
LIMIT The length of the segment
Pp Present Bit: 1 = Present 0 = Not present
DPL Descriptor privilege Level 0 - 3
s ‘Segment Descriptor : 0 = System Descriptor 1 = Code or Data Segment Descriptor
TYPE Typo of segment
A ‘Accessed Bit
s Granutarity Bit : 1 = Segment length is page granular 0 = Segment length is byte granular
D Default Operation Size (recognised in code segment descriptors only)
1= 32-bit segment = 16-bit segment
o Bit must be zero (0) for compatibility with future processors
AVL Available field for user or OS
In a maximum - size segment (i.e. a segment with G = 1 and segment limit 19 ..... 0 = FFFFFH),
the lowest 12 bits of the segment base should be zero. (i. segment base 11 .... 000 = 000H).
Fig. 4.5 General Segment Descriptor Format
D (Default size) : When this bit is cleared, operands contained within this segment are
assured to be 16 bits in size. When it is set, operands are assumed to be 32-bits.
0 (Reserved by Intel) : It neither can be defined nor can be used by user. This bit
must be zero for compatibility with future- processors.
AVLIU (User Bit) : This bit is completely undefined, and Pentium processor ignores it.
This is available field /bit for user or operating system.
Access Rights Byte
P (Present Bit) : The present P bit is 1 if the segment is loaded in the physical
memory, if P = 0 then any attempt to access this segment causes a not present exception
(exception 11).
DPL (Descriptor Privilege Level) : It is a 2-bit field defines the level of privilege
associated with the memory space that the descriptor defines- DPLy is the most privileged
whereas DPL, is the least privileged.
S (System Bit) : The segment $ bit in the segment descriptor determines if a given
segment is a system segment or a code or a data segment. If the S bit is 1 then the
segment is either a code or data segment, if it is 0 then the segment is system segment.
Type : This specifies the specific descriptors among various kinds of descriptors.
(Detail explanation is given in the following sections).
A (Accessed Bit) : The Pentium processor automatically sets this bit when a selector
for the descriptor is loaded into a segment register. This means that Pentium processor sets
accessed bit whenever a memory reference is made by accessing the segment.Microprocessors and Microcontrollers __ 4-8 Protected Mode
A Segment Descriptor
= Describes a segment.
= Must be created for every segment.
= Is created by the programmer.
= Determines a base address of the segment.
= Determines a size of the segment.
= Determines a type of the segment.
= Determines a privilege level of the segment.
Segment Descriptor Defines
= Base address (32-bits)
= Segment limit (20 bits)
= Type of segment (4 bits)
«= Privilege level of segment (2 bits)
= Whether segment is physically present (1 bit)
= Whether segment has been accessed before (1 bit)
= Granularity of limit field (1 bit)
= Size of operands within segment (1 bit)
= Intel reserved bit (1 bit)
= AVL bit (1 bit)
= Default size (1 bit)
4.5.1 Types of Segment Descriptors
Fig. 46 shows the types of segment descriptors. As shown in the Fig. 4.6, there are
two main categories of segments. System segments and Non-system segment. These two
basic types are further categorised into five types.
System Non - System
LOT TSS Gate Code Data
Fig. 4.6 Types of segment descriptorsMicroprocessors and Microcontrollers 4-9 Protected Mode
4.5.1.1 Non-system Segment Descriptor
The code and data segment descriptors are the non-system segment descriptors. Fig. 47
shows the general format for code and data segment descriptor and Table 4.1 illustrate
how the specific bits in the access right byte are interpreted in data and code segment
descriptors.
SEGMENT BASE 15......0 SEGMENT LIMIT 15.....0
BASE umiT | ACCESS RIGHTS
16 BYTE
0 = Default Instruction Attributes are 16 - its
AVL Available field for user or OS
sc Grenularity Bit 1 = Segment length is page granular
0 = Segment length is byte granular
0 Bit must be zero (0) for compatibility with future processors
Note :
In amaximum - size segment (i.e. a segment with G = 1 and segment limit 19 .
the lowest 12 bits of the segment base should be zero. (i.e. segment base 11
FEFFFH),
(000 = 00H).
Fig. 4.7 General format for code/data segment descriptor
Bit Name Function
Position
7 Present (P) P Segment is mapped into physical memory.
P-0 No mapping to physical memory exists, base and limit
are not used.
65 | Descriptor Privilege| ‘Segment privilege attribute used in privilege tests.
Level (OPL)
4 Segment Descriptor] S = Code or Data (includes stacks) segment desctiptor
4 §) S=0 System segment descriptor or Gate descriptor
3 Executable (E) Descriptor type is data segment ;
2 Expansion Direction! ED = 0 Expand up s offsets must be < limi
©) ED=1 Expand down segment, offsets must be > limit,
1 Writeabie (W) w=o Data segment may not be written into.
wet Data segment may be written into.
Note : If data segment (S = 1. E = 0)Microprocessors and Microcontrollers __4-10 Protected Mode
[3 Executable (E) Descriptor type is code segment:
2 Conforming (C) c=1 Code segment may only be executed when CPL > DPL
and CPL remains unchanged.
1 Readable (R) R=0 Code segment may not be read.
R= Code segment may be read.
Note : If code segment (S = 1, € = 1)
0 ‘Accessed (A) A= Segment has not been accessed.
AS Segment selector has been taded inlo segment]
register or used by selector test instructions.
Table 4.1 Access rights for segments
The Executable (E) bit indicates whether segment is code or data segment. If E bit is 1,
segment is code segment otherwise segment is data segment. The code segment may be
executable, or executable and read. This is determined by Readable (R) bit. If R bit is 1,
code segment is executable and readable otherwise it is only executable. If conforming bit
(©) is 1, code segment can be executed and shared by programs at different privilege
levels.
In case of stack segment, segment starts at the base linear address plus the maximum
segment limit, whereas data segment start at the base linear address and expand to the
base linear address plus limit as shown in Fig. 48:
(Base linear
Max Limit address)
SFFFFFFF 4 SFFFFFFFY,
Data Limit Limit
t | segment FFFF 4 | FFFF 4
Max Limit 3FFFOOU0,,
3FFFO000,
Max Limit
(Base inear .
address)
Fig. 4.8 Expansion direction for data and stack segment
The expansion direction (ED) bit specifies expansion direction for the segment. If
ED = 0, expansion direction is upward which is data segment and if ED = 1, expansion
direction is downwards which is stack segment
The write (W) bit for data segment indicates whether the data segment is read only, c~
read and write. If W bit 0, data segment is read onh
For stack segment W bit must be logic 1
; otherwise it is read/write segment.Microprocessors and Microcontrollers 4-11 Protected Mode
The Fig. 49 (a) and (b) show the code segment descriptor access right byte
configuration and data segment descriptor access right byte configuration.
Accessed (1 = yes)
Readable (1 = yes)
Conforming (1 = yes)
Executable (1 = yes for code)
(Indicates segment descriptor for code or data)
Descriptor Privilege Level
Present (1 = yes)
Fig. 4.9 (a) Code segment descriptor access right byte configuration
MSB LsB
Accessed (1 = yes)
Writeable (1 = yes)
Expand down (1 = down)
Executable (0 = no for data)
(Indicates sogment descriptor for code or data)
Descriptor Privilege Level
Present (1 = yes)
Fig. 4.9 (b) Data segment descriptor access right byte configuration
4.5.1.2 System Segment Descriptors
System segments gives the information of operating system tables, tasks and gates.
Fig. 4.10 shows the general format of system segment descriptor. From Fig. 4.10 it can be
seen that several descriptor fields (Base address, limit, Granularity bit G and Present bit P)
are similar to the general segment descriptor. Fig. 4.10 also shows the various types of
system segment descriptors. Let us discuss the various system segment descriptors.
a) LDT Descriptors (S = 0, Type = 2):
The LDT descriptors are present only in the Global Descriptor Table (GDT). They
contain the information about the local descriptor tables. The local descriptor tables
contains the segment descriptors which are unique to a particular task, The DPL
(Descriptor privilege field) of this descriptor is ignored because it can be accessed with
only privilege level 0.Microprocessors and Microcontrollers ___ 4-12 Protected Mode
b) TSS Descriptor (S = 0, Type = 1, 3, 9, B):
In a multitasking environment computer performs more than one task at a time, and
it also switch between the task. A task can be a single program, or it can be a group of
related programs. When it switches from taskl to task2, it stores all the information
necessary to restart the taskl later in time exactly as it was left. It involves saving the
contents of all of the processor registers as well as any read/write memory variables and
the address of next instruction to be executed. Such information is called state of the task
or context of the task.
3 16 0
‘Segment Base 15.....0 ‘Segment Limit 15
Base sim, [P| OPE 6
31... 24
Type Defines Type Defines
0 Invalid 8 Invalid
1 Available 80286 TSS 9 Available intel Pentium processor TSS
2 LOT A Undefined (Intel Reserved)
3 Busy 80286 TSS B Busy Intel Pentium processor TSS.
4 80286 Call Gate © Intel Pentium processor Call Gate
5 Task Gate (for 80286 or Intel Pentium D Undefined (Intel Reserved)
processor Task) E Intel Pentium processor Interrupt Gate
6 80286 Interrupt Gate F Intel Pentium Processor Trap Gate
7 80286 Trap Gate
Note
Ina maximum - size segment (i.e. a segment with G = 1 and segment limit 19 .. FFFFH),
the lowest 12 bits of the segment base should be zero. (i.e. segment base 11 ..... 000 = O00H).
Fig. 4.10 System segment descriptor
The Pentium processor uses a special segment called task state segment (TSS) to store
the state/context of the task. This segment can be addressed with the help of task state
segment (ISS) descriptor. The TSS descriptor contains information about the location, size
and privilege level of a TSS.
Alongwith the context of the task, the TSS also contains the linkage field for the next
task which allows the nesting of tasks. The TSS descriptor gives base address and limit for
TSS. Its TYPE field is used to indicate whether task is currently BUSY (i.e. on a chain of
active tasks) or the TSS is available. The TYPE field also indicates if the segment contains a
80286 or an Pentium processor TSS.Microprocessors and Microcontrollers 413 Protected Mode
¢) Gate Descriptors (§ = 0, TYPE = 4 - 7, C, F):
A gate is a special type of descriptor. It allows the Pentium processor to automatically
perform protection checks. There are four types of gate descriptors
= Call gate
= Task gate
= Interrupt gate
= Trap gate
Call gates are used to change privilege levels 4.8.2 task gates are used to perform a
task switch 4.12.5 and interrupt and trap gates are used to specify interrupt service
routines.
Fig. 4.11 shows the format of the four types of gate descriptors.
a” 24 16 8 5 0
Selecior 0 4
Offset 31... 16
Name
Type Value Description
80286 call gate
Task gate (for 80286 or intel Pentium processor task)
80286 interrupt gate
80286 trap gate
Intel Pentium processor call gate
Intel Pentium proc
Intel Pentium processor trap gate
Description contents are not valid
Description contents are valid
PL - Least privileged level at which a task may access the gate. WORD COUNT 0 - 31 - the number of
Parameters to copy from caller's stack to the called procedure's stack. The parameters are 32 - bt quantities,
{or intel Pentium processor gates and 16 - bit quantities for 80286 gates.
sor interrupt gate
sonmorvoas
Destination 16 - bit Selector to the target code segment
Selector selector or
Selector to the target task state segment for task gate
Destination _—offset Entry point within the target code segment
Offset 16 - bit 80286
32 - bit Pentium
processor
Fig. 4.11 Gate descriptor formats
4.5.2 Descriptor Tables
As mentioned earlier, segment descriptors are grouped and placed one after the other
in contiguous memory locations. This group arrangement is known as a descriptor table.Microprocessors and Microcontrollers __4-14 Protected Mode
The maximum limit for the length of descriptor table is 64KBytes and we know that each
descriptor takes 8 bytes to store the information of a particular segment. So descriptor
table can have as many as 8192 descriptors. The upper 13 bits of a selector are used as an
index into the descriptor table.
There are three types of descriptor tables
= Global Descriptor Table (GDT)
= Local Descriptor Table (LDT)
= Interrupt Descriptor Table (IDT)
These are used for a different purpose. Thus it is necessary to consider use of a
segment before deciding in which table it must be included.
The Global Descriptor Table (GDT) is a general purpose table of descriptors, can be
used by all programs to reference segments of memory. The GDT can have any type of
segment descriptor except for descriptors which are used for serving interrupts. The
Interrupt Descriptor Table (IDT) holds the segment descriptors that define interrupt or
exception handling routines. The IDT is & direct replacement for the interrupt vector table
used in 8086 system. A Local Descriptor Tables (LDT) are set up in the system for
individual task or closely related group of tasks. Fig. 4.12 shows how tasks use its
individual memory area defined by the descriptors from the corresponding local descriptor
table and how it shares the memory area defined by the descriptors from the global
descriptor table. .
Task 1
Virtual Address Space
Task 3
Virtual Address Space
Task 2
Virtual Address Space
Fig. 4.12 Memory area shared by different tasksMicroprocessors and Microcontrollers 4-15 Protected Mode
Descriptor Tables
1. Global Descriptor Table (GDT)
= Unique table
= Holds most of segments can be used by all program
= May contain special system descriptors
2. Interrupt Descriptor Table (IDT)
= Unique Table
«Holds segment descriptors defined by interrupt or exception service routines
3. Local Descriptor Table (LDT)
= Is optional
= Extends range of GDT
= Setup for individual task
As we know, the descriptors are stored in the descriptor tables. But it is important to
know that where these tables are stored? It is possible to place descriptor tables anywhere
in the processor's address space and it is not necessary to keep them together. Each of the
tables has a register associated with it the GDTR, the LDTR and the IDTR.
Each of these register contains the 32-bit linear address of the base of its descriptor
table and the table's limit. The base address of a descriptor table is the linear address of
the first byte of the first descriptor in the table. The limit specifies how long the table is
and therefore how many descriptors it has.
Global Descriptor Table Register (GDTR) :
Fig. 4.13 shows how the contents of the global descriptor table register are used to
define a Global descriptor table in the Pentium processor physical memory address space.
GDTR is a 48-bit register located inside the Pentium processor. The lower two bytes of this
register specifies the LIMIT, (in bytes) for the GDT. The value of limit is one less than the
actual size of the table. For example, if LIMIT is 03FFH then the table is 1024 (1023 + 1)
bytes in length (03FFH = 1023)9). Since the LIMIT field is 16 bit long, the GDT can grow
up to 65,536 bytes long. The upper four bytes of GDTR specifies the 32-bit linear address
of the base of the Global Descriptor Table (GDT).
Interrupt Descriptor Table Register (IDTR) :
Like Global Descriptor Table Register, Interrupt Descriptor Table Register holds the
16-bit limit and 32-bit linear address of the base of the Interrupt Descriptor Table (IDT).
Fig. 4.14 shows how the contents of the Interrupt Descriptor Table Register are used to
define a Interrupt Descriptor Table (IDT) in the Pentium processor physical memory
address space.Microprocessors and Microcontrol 4-16 Protected Mode
Physical memory
Global descriptor table register (GDTR) Global
47____ 40:39 16 15 ae
Fig. 4.13 GDTR and GDT
Interrupt descriptor table register (IDTR)
Interrupt
descriptor
table (IDT)
a7 40 39 16 15
Fig, 4.14 IDTR and IDTMicroprocessors and Microcontrollers 4-17 Protected Mode
Like GDTR, the IDTR is also 48 bit in length, with lower two bytes defines Limits and
upper 4 bytes defines the base address. Since limit field is two bytes, the IDT can also be
up to 65536 bytes long. But the Pentium processor only supports upto 256 interrupts or
exceptions; therefore, the size of the IDT should not be set to support more than 256
interrupts.
Local Descriptor Table Register (LDTR) :
Unlike GDTR and IDTR, the LDTR is a 16-bit register. It does not specify any limit or
base address for the segment but it specifies the address of the LDT descriptor stored in
the Global descriptor table (GDT). Fig. 4.15 shows LDTR, GDT and LDT shows how
contents of LDTR are used indirectly to define a Local Descriptor Table.
Physical memory
FFFFFFFF
Descriptor 1
LDT Descriptor
Descriptor 0
(00000000
Fig. 4.15 Global and local descriptor tables
GDTR
LDTR holds a selector that points to an LDT descriptor in the GDT. Whenever a
selector is loaded into the LDTR, the corresponding descriptor is located in the global
descriptor table. The contents of this descriptor defines the local descriptor table. The 32-bit
base value defines:starting point of the table in the Pentium processor physical memory
address space and 16-bit limit specifies the size of the table.
The GDT can contain many LDT descriptors. To put particular LDT in service, it is
necessary to load the LDTR with corresponding selector.Microprocessors and Microcontrollers _ 4-18 Protected Mode
For loading the values in GDTR, IDTR and LDIR registers, Pentium processor
provides LGDT, LLDT, and LIDT instructions. It also provides SGDT, SLDT and SIDT
instructions. These (48 bits) instructions copy the contents of the descriptor table registers
into the six bytes of memory pointed by the destination operand. These tables are
manipulated by the operating system. Thus, the instructions used for loading the
descriptor tables are privileged instructions.
4.5.3 More about Segment Registers
From the previous discussion, we know that segment register contents are used as a
selector to select specific descriptor from the descriptor table. This part of the segment
register is visible to programmer. Fig. 4.16 shows complete segment register with visible
and hidden part of it, The hidden part is referred to as segment descriptor cache register.
Using these registers Pentium processor stores information from descriptor, thereby
avoiding the need to consult a descriptor table every time it accesses memory. Segment
Register (visible portion) contents are manipulated by programs whereas segment
descriptor cache register (hidden portion) contents are manipulated by processor. Once the
descriptors are cached, subsequent references to them are performed without any
overhead for loading of the descriptor. This is the biggest advantage of segment descriptor
cache registers.
16-bit visible
selector Hidden Descriptor
cs[ id
ss
Ds
ES
FS
cs
Fig. 4.16 Segment register and segment descriptor cache
mm> Example 4.1 : Assume (DS) = 0204H [ESI] = 00002000H. Paging is disabled and mode
is protected mode.
1. From which of the three descriptors (IDT, LDT, GDT) the descriptor will be considered ?
Give the descriptor number.
2. Assume appropriate values in the descriptor selected and explain how the address
translation takes place when the following instruction is executed.
MOV AX, [ESI]
Solution : 1. Here, DS register is used as a selector. Fig, 4.17 shows the definitions of the
selector bits and the contents of DS are 0204H.Microprocessors and Microcontrollers __ 4-19 Protected Mode
6 0
Ds Register
Fig. 4.17
From the figure we can see that
RPL = 00
Tl=1
Since TI (Table Indicator) bit is set, the descriptor from the current LDT will be
referred.
2. The descriptor gives the segment base address’ and segment limit. Let us assume
segment base address = 0000 0000 H and limit = FFFFFH. As paging is disabled, the
physical address of memory is given by
PA = Base address + Offset
Note : Offset < segment limit.
In our case offset is given by ESI (0000 2000H), which is within the limit ie. less than
segment limit. Therefore the physical address of memory,
PA = 0000 0000 H + 0000 2000 H
0000 2000 H
When MOV AX, [ESI] instruction is executed the contents from memory location 0000
2000H are copied into AL register and contents from memory location 0000 2001H are
copied into AH register.
4.6 Paging
Paging or page translation is the second phase of address translation. In this phase
Pentium processor transforms a linear address generated by segment translation into a
physical address. The page translation step is optional. Page translation is in effect only
when the PG bit of CRO is set. Page translation is must if the operating system is to
implement multiple virtual 8086 tasks, page-oriented protection, or page oriented virtual
memory.Microprocessors and Microcontrollers 4-20 Protected Mode
When paging is enabled, the paging unit arranges the physical address space into
1,048,496 pages that are each 4096 bytes long. Fig. 4.18 shows organization of physical
address space using paging.
4KB
4KB
Page 1,048,495
Be
4KB
4kB
4KB
Fig. 4.18 Paged organization of the physical address space
4.6.1 Support Registers and Tables
There are three components to the paging mechanism of the Pentium processor : Page
directory, the page tables, and the page itself (page frame or page). Like segmentation,
paging depends on special memory resident tables. Out of three components, page
directory and page tables are in the table form. Both are made up of 32-bit descriptors.
Unlike tables of segment descriptors, each page directory or page table must contain
exactly 1024 descriptors, making each directory or table exactly 4096 bytes (4KB) long. A
page frame is a 4 Kbyte unit of contiguous addresses of physical memory.
When paging is enabled the linear
Linear address. address generated by the segment
~ translation process is not used as a
physical address. The Pentium
[_ovecery [race [ote | processor uses two levels of tables to
31 22 21 42 11 0 translate the linear address (from the
segment translation) into a physical
Fig. 4.19 Linear address format address. Fig. 4.19 shows the format of
linear address. Processor internally
divides a linear address into three fields : Two fields of 10 bits each and one field of 12
bits. The most significant 10 bits (DIR field) of the linear address are used as an index into
a page directory. The next most significant 10 bits (PAGE field) of the linear address are
used as an index into the page table determined by the page directory. The least
significant 12 bits (OFFSET) select one of 4096 bytes of memory from the page frame
determined by the page table. The physical address of the current page directory is storedMicroprocessors and Microcontrollers 4-21 Protected Mode
in the control register (CR3) which is also referred to as page directory base register
(PDBR). Fig. 4.20 shows how the Pentium processor converts the DIR, PAGE and OFFSET
fields of a linear address into the physical address by consulting two levels of page tables.
Linear
ct ce
CR3 Page directory Page table Page frame
Fig, 4.20 Linear to physical address translation
The descriptor in a page directory is referred to as a Page Directory Entry (PDE) and
descriptor in the page table is referred to as Page Table Entry (PTE).
4.6.2 PDE Descriptor
Fig. 421 shows format for page directory entry. A page directory entry is having six
fields.
210
zu
31 9
Pape a ates
Li
Fig. 4.21 Page directory entry
Page Table Address :
The page table address specifies the physical starting address of the base of a page
table. This field (page table address) specifies 20 most significant bits and remaining 12
bits are all 0's. This locates all page tables on 4K boundaries.Microprocessors and Microcontrollers __ 4-22 Protected Mode
User/Avail :
Bits 9, 10, and 11 are not used by the Pentium processor. Users are free to use them.
Accessed Bit :
The Pentium processor automatically sets accessed bit whenever PDE is used in «
address translation or another page related function. It is never cleared unless you write
code to do it manually.
User/Supervisor and Read/Write Bits :
These bits are not used for address translation, but are used for page-level protection
which the Pentium processor performs at the same time as address translation.
If User/Supervisor bit is set, the memory pages covered by this PDE are accessible
from all privilege levels. If it is cleared, the pages are accessible only by PLO, 1 and 2.
If User/Supervisor bit is cleared Read/Write bit has no effect. But if User/Supervisor
bit is set and read/write bit is 1, memory pages covered by this PDE are write protected.
If Read/Write bit is set, write privileges are allowed from PL3 code. The access rights just
discussed are summarized in table 4.2.
us RW | Permitted Level 3 | Permitted Access
Levels 0, 1, or 2
o 0 None Read/Write
oO 1 None Read/Write
1 0 Read-Only Read/Write 1
1 1 Read/Write Read/Write
Table 4.2 Protection provided by R/W and US
Present :
The present bit indicates whether a page table entry can be used in address
translation. P = 1 indicates that the entry can be used and page table pointed by PDE is
present in the physical memory. If P = 0, the page table referred to is not present Fig. 4.22
shows the format of a not present page descriptor.
31 10
vee |
Fig. 4.22 Not present page descriptorMicroprocessors and Microcontrollers 4-23 Protected Mode
4.6.3 PTE Descriptor
Fig, 4.23 shows format for page table entry. A page table entry has seven fields.
1211 24
31 9 6 5 0
Peters [me ees
Fig. 4.23 Page table entry
Page Frame Address :
The page frame address specifies the physical starting address of a 4 KB page frame or
a page. This field (page frame address) specifies 20 most significant bits and remaining 12
bits are all 0's. This locates all pages on 4K boundaries.
User/Avail Bits :
Bits 9, 10, 11 are not used by the Pentium processor. Users are free to use them.
Accessed Bit :
Accessed bit is set by the Pentium processor whenever this PTE is used in a paging
related function. The Pentium processor never clears this bit. User can keep track of the
most often used pages of memory by periodically testing and clearing this bit in all PTEs.
Dirty Bit :
The dirty bit is automatically set by the Pentium processor whenever page frame
covered by PTE is written into. The Pentium processor never clears this bit. User can keep
track of the most often written page of memory by periodically testing and clearing this
bit.
User/Supervisor and Read/Write Bits :
These bits are not used for address translation, but are used for page-level protection
which the Pentium processor performs at the same time as address translation.
If User/Supervisor bit is set, the memory pages covered by this PTE are accessible
from all privilege levels. If it is cleared, the pages are accessible only by PLO, 1 and 2
If User/Supervisor bit is cleared Read/Write bit has no effect. But if User/Supervisor
bit is set and read/write bit is 1, memory pages covered by this PTE are write protected. If
Read/Write bit is set, write privileges are allowed from PL3 code.Microprocessors and Microcontrollers _ 4-24 Protected Mode
Present :
The present bit indicates whether a page table entry can be used in address
translation. P = 1 indicates that the entry can be used and page table pointed by PTE is
present in the physical memory. If P = 0, the page table referred to is not present.
Fig. 4.24 shows both the phases of address translation. It shows how logically address
is converted into physical address when paging is enabled.
0
Base Address
(82 bit) Linear
crs Page directory Page table Page frame
Fig. 4.24 Protected mode address translationMicroprocessors and Microcontrollers 4-25 Protected Mode
4.7 Translation Lookaside Buffer or Page Translation Cache
The Pentium processor paging mechanism is designed to support demand paged
virtual memory systems. However, performance would degrade substantially if the
processor was required to access two levels of tables (Page directory and page table) for
every memory access. To solve this problem, the Pentium processor stores the most
recently used page table entries in an on-chip cache. This cache is called the Translation
Lookaside Buffer (TLB). The TLB holds upto 32 page table entries. The 32-entry TLB
coupled with a 4K page size, results in coverage of 128K bytes of memory addresses.
Whenever program generates linear address that maps to a page table entry (PTE) already
in the cache, the Pentium processor can use the cached information it has internally. This
saves two outside memory references, improving performance in address translation. For
many common multi-tasking systems, the TLB will have a hit rate of about 98%. This
means that the processor will only have to access the two-level page structure on 2% of all
memory accesses. Fig. 4.25 illustrates how the TLB supports the Pentium processor paging
mechanism.
32 Entries Physical memory
Unear address | Translation look aside
buffer
Page directory Page table
Fig. 4.25 Translation lookaside buffer
4.8 Paging Operation
The paging mechanism receives a 32-bit linear address from the segmentation unit. The
upper 20-bits of linear address are compared with all 32 entries in the TLB to determine if
there is a match. If there is a match (ie. a TLB hit), then the 32-bit physical address is
calculated and will be placed on the address bus.
However, if the page table entry is not in the TLB, the Pentium processor reads the
appropriate Page Directory Entry. If P = 1 on the Page Directory Entry indicating that the
page table is in memory, then the Pentium processor reads the appropriate Page TableMicroprocessors and Microcontrollers __4-26 Protected Mode
Entry and set the Access bit. If P = 1 on the Page Table Entry indicating that the page is in
memory, the updates the Access and dirty bits as needed and fetch the operand. Then
Pentium processor stores the upper 20 bits of the linear address, read from the page table
in the TLB for future accesses. However, if P = 0 for either the Page Directory Entry or the
Page Table Entry, then the Pentium processor generates a page fault, an exception 14.
The Pentium processor also generates an exception 14, page fault, if the memory access
violates the page protection attributes (i.e. U/S or R/W) (e.g. trying to write a read-only
page)
If Pentium processor wants to access the physical memory space whose information is
not in the cache then the Pentium processor examines the 32 existing cache entries and
throws out the least recently used PTE. It then puts new PTE in its place. This method of
updating cache is known as LRU (Least Recently Used).
It is necessary to flush the entire cache whenever the page tables ate changed. The
page-translation cache is invisible for application programmer but these are visible for
system programmers. Thus system programmer's can flush the cache by using following
methods.
1. By reloading CR3 with a MOV instruction.
For Example : MOV CR3, EAX
2. By performing a task switching to a TSS that has a different CR3 image than the
current TSS (Task Switching is explained in more detail later in this chapter).
4.9 Protection
Problem may occur in a multitasking operating systems or multi-user systems when
two or more users attempt to read and change the contents of a memory location at the
same time. The section of a program where the value of a variable is being read and
changed (critical section) must be protected from access by other tasks until the operation
is complete.
Another region that requires protection is the operating system code. The incorrect
address in a user program may cause program to write over the critical sections of the
operating system corrupting the operating system code and data areas. The system then
‘locks-up' and the only way to get control again is to reboot the system. In a multitasking
system this is intolerable, so several methods are used to protect the operating system.
The Pentium processor uses segment level protection and privilege level protection
mechanisms to protect critical sections. When an attempt is made to access a segment by
loading a segment selector into the visible part of a segment register, the protection
mechanism of Pentium processor makes several checks such as type checking, limit
checking privilege level checking and so on. In this section we are going to study the
protection mechanism provided by Pentium processor to run the system relatively safe
from accidental mishaps.Microprocessors and Microcontrollers __ 4-27 Protected Mode
4.9.1 Protection By Segmentation
When an attempt is made to access a segment first of all, the Pentium processor checks
to see if the descriptor table indexed by the selector contains a valid descriptor for that
selector. If the selector attempts to access a location outside the limit of the descriptor table
or the location indexed by the selector in the descriptor table does not contain a valid
descriptor, then an exception is produced.
The Pentium processor also checks to see if the segment descriptor is of the right type
to be loaded into the specified segment register cache. The descriptor for a read-only data
segment, for example cannot be loaded into the SS register, because a stack must be able
to be written to. A selector for a code segment which has been marked “execute only"
cannot be loaded into the DS register to allow reading the contents of the segment.
If all above protection conditions are met, the limit, base, and access rights bytes of the
segment descriptor are copied into the hidden part of the segment register. The Pentium
processor then checks the P (Present) bit of the access byte to see if the segment for that
descriptor is present, a type 11 exception is generated.
After a segment selector and descriptor are loaded into a segment register, further
checks are made each time a location in the actual segment is accessed. These checks are
type checking and limit checking,
Type Checking
‘Type field of the descriptor specifies type of the descriptor and the intended usage of
the segment. As mentioned in the previous section, W (writeable), R (Readable), C
(conforming), A (Accessed) and, E (Expanded-Down) bits from type field specify the usage
of the segment and restrict segment for particular use only. For example, if R bit 1, the
segment is read only segment. Its accessed is limited to only reading purpose.
‘Type checking is used to detect whether any program is attempting to use segments in
ways not intended by the programmer.
Limit Checking
The Pentium processor uses limit field of a segment descriptor to prevent programs
from addressing outside the segments. It interprets limit field depending on the setting of
the G (granularity) bit, which specifies whether limit value counts 1 byte or 4 Kbytes. In
case of data segments processor also checks ED (Expansion direction) bit and B (Big) bit.
For all types of segments expand-down data segment, the value of the limit is one less
than the size (expressed in bytes) of the segments. The Pentium processor causes a general
protection exception when program attempts to
= Access memory byte at an address > limit
= Access memory word at an address 2 limit
= Access memory Dword at an address > (limit-2)Microprocessors and Microcontrollers __4-28 Protected Mode
For expand-down data segments, the limit is interpreted differently. In these cases the
range of valid addresses is from limit + 1 to either 64K or 231-1 (4 Gbyte) depending on
the B-bit.
4.9.2 Privilege Level Protection
The Pentium processor has four levels of protection which are optimized to support
the needs of a multi-tasking operating system to isolate and protect user programs from
each other and the operating system. The four level of protections are four privilege levels,
numbered from 0 to 3. The value zero repreents highest privilege level and value 3
represents lowest privilege level. Fig. 426 shows how a Pentium processor protected mode
system can be set up with four privilege levels. It shows that operating system kernel is
assigned with the highest privilege level, which is privilege level 0 (PLO). The system
services such as BIOS procedures are assigned with PL1, whereas custom device drivers
are assigned with PL2 and finally application programs are assigned with PL3.
Task
Applications
Custom
Extensions
Task 8 Task A
Fig. 4.26 Assignment of privilege levels
The Pentium processor assigns these levels to different objects such as descriptors and
selectors. The assigned privilege levels are stored in the respective fields as given below.
= Descriptors contain field called the descriptor privilege level (DPL)
= Selectors contain field called the requester’s privilege level (RPL). The RPL is
intended to represent the privilege level of the procedure that originates a selector.Microprocessors and Microcontrollers 4-29 Protected Mode
= The Pentium processor stores the descriptors in the internal cache (hidden portion
of segment registers) for currently executing segments. Privilege levels for such
descriptors are referred to as current privilege level (CPL).
Now we see how Pentium processor evaluates the right of a procedure to access
another segment and thus how it achieves the remaining aspects of protection.
4.9.2.1 Restricting Access to Data
When an attempt is made to access a data segment by loading a segment selector into
the visible portion of a data segment register (DS, ES, FS, GS, $S) , the Pentium processor
automatically makes several checks by comparing privilege levels. The Pentium processor
checks three different types of privilege levels as shown in Fig. 4.27.
16-Bit visible
selector Invisible Descriptor
Target segment selector
Privilege
check
byCPU
Data Segment Descriptor
Segment Limit 15.....
CPL. - Current Privilege Level
RPL - Requestor’s Privilege Level
DPL - Descriptor privilage level
Fig. 4.27 Privilege check for data access
1. The CPL (Current Privilege Level)
2. The RPL (Requester’s Privilege Level) of the selector used to specify the target
segment.
3. The DPL of the descriptor of the target segment
Program can load a data segment register only if the DPL of the target segment is
numerically greater than or equal to the maximum of the CPL and the selector’s RPL. In
other words, a procedure can only access data that is at the same or less privileged level.
Following Table 4.3 gives exact idea about data access.Microprocessors and Microcontrollers 4-30 Protected Mode
No Privilege Levels Access
DPL CPL RPL
1 2 0 1 Valid
2 3 1 2 valia
3 1 1 0 Valid
4 1 2 0 Invalid
5 2 2 3 Invalis
Table 4.3 Data accesses
4.9.2.2 Accessing Data in Code Segments
It is possible to read data from code segment. There are three ways of reading data
from code segments.
1. Load a data segment register with a selector of a non conforming, readable,
executable segment.
2. Load a data segment register with a selector of a conforming, readable, executable
segment.
3. Use a CS override prefix to read a readable, executable segment whose selector is
already loaded in the CS register.
In case 1, procedure can only access data that is at the same or less privileged level.
Case 2 is always valid because the privilege level of segment whose conforming bit ‘s set.
Case 3 is also always valid because the DPL of the code segment in CS is by definition,
equal to CPL.
16-Bit visible
selector Invisible Part
Privilege
check
byCPU
Code Segment Descriptor
Base 31
Sogmont Base 15 ....0 ‘Segment Limit 15
CPL - Current Privilege Level
RPL - Requestor’s Privilege Level
DPL. - Descriptor privilege level
Fig. 4.28 Privilege check for control transferMicroprocessors and Microcontrollers 4-31 Protected Mode
4.9.2.3 Restricting Control Transfers
The Pentium processor can transfer program control with the help of JMP, CALL, RET,
INT ard IRET instructions. The “near” forms of JMP, CALL and RET transfer control
within the current segment so these are subjected to only limit checking. But in case of far
JMP, CALL and RET transfers, control is transferred to other segment. In such cases
Pentium processor performs privilege checking.
To successfully transfer the control to other segment, both the RPL and the CPL must
be a number less than or equal to the DPL of the segment. In other words, the privilege
level of the requesting selector and current privilege level must both be greater than or
equal to the privilege level of the desired segment.
Max (CPL, RPL) < DPL
4.9.3 Inter-privilege Level Transfer of Control
After looking all these restrictions the question that might come to mind at this point
is, if @ task cannot access a segment with a more privileged (numerically less) DPL, how
can user programs access the operating system kernel, BIOS, or utility procedures in
segments which have more privileged (nimerically less) DPLs ? There are two ways to
access a procedure located in a-segment which has a higher privilege level.
1. The first option has a restriction that the segment which has a higher privilege
level must be a conforming code segment.
2. The second option is more complex, but allows to access the segment which has a
higher privilege level using special structure known as Call Gate.
4.
1 Conforming Code Segment
A code segment is considered conforming if bit 2 of the access rights byte of its
descriptor is set. Conforming code segments have no inherent privilege level of their own;
they confirm to that level of the code that CALLs them or JMPs to them. For example, if a
program in a PL3 segment transfers control to a conforming code segment, then the
conforming code runs with CPL equal to 3. If the same segment is invoked by PLO code, it
runs with a CPL of 0.
When the control is transferred to a conforming code segment, the RPL bits of register
CS are not changed to match the DPL of segment, as they normally would be. Instead,
they still reflect the correct CPL the DPL of the last non-conforming code segment that
was executed. This is the only time that the RPL bits in the CS register might not match
the DPL bits in the currently executing segment.
Even though conforming code segments do not have any particular privilege level
associated with them, there is still one restriction regarding when a conforming segment
can be used. The DPL of the conforming descriptor must always be less than or equal to
the current CPL. You can never transfer control to a segment whose DPL is greater (less
privileged) than the current segment. This is done because at the time of transferring
control back to the original segment from conforming code segment there is change inMicroprocessors and Microcontrollers 4:32 Protected Mode
privilege level. Here, conforming code segment must have higher or same privilege level
than original segment to allow control to return back to the original segment. The
following Table 4.4 gives exact idea about access of conforming code segment.
No. Current Privilege DPL of Conforming Access
Level (CPL) Code Segment
1 3 2 Valid
2 2 ° Valid
1
3 1 1 Valid
4 1 2 Invalid
5 2 3 Invalid
Table 4.4 Accessing of conforming code segment
4.9.3.2 Call Gates
A call gate is simply a special type of descriptor as shown in 4.29. Unlike code, data,
or stack descriptors or the system type LDT descriptor, call gate descriptors do not define
any memory space. They have no base address or limit fields. Actually, they are not
descriptors at all, but it is convenient to place them in descriptor tablés. It acts as an
interface layer between code segments at different privilege levels. The “call gate” is the
only mechanism that allows to call a procedure located in any segment (conforming or
non-conforming) which has a higher privilege level. JMPs are not allowed. Hence the name
“call gate”. It is important that the CALL must refer a call gate, not the destination code
segment. The call gate defines the code segment and the exact offset where the control is
to be transferred. Users are not allowed to specify the desired offset in their programs.
Because any wrong offset may corrupt the procedure if control is transferred into the
middle of a subroutine or, worse yet, into the middle of an instruction. Call Gate
descriptor is put in the GDT or in LDT, just as segment and other descriptors. When a
program does a CALL to a procedure in another segment, the selector for that segment’s
call gate is placed into the visible portion of CS register, and the CALL gate descriptor is
Call Gates
31 23 15 7 0
Offset 31 .... 24
Selector
Fig. 4.29 Format of Pentium processor call gateMicroprocessors and Microcontrollers 4-33 Protected Mode
placed in hidden portion of CS register. The call gate descriptor contains two important
things :
1. Selector which points to the descriptor for the segment where the procedure is
actually loaded.
2. Offset of the called procedure in its segment. .
If the call is valid, the selector from the call gate (points to the descriptor for the
segment where the procedure is actually loaded) is placed in the visible portion of CS
register and the corresponding segment descriptor is loaded into the hidden portion of CS
register. The Pentium processor then uses the base address from the segment descriptor
and the offset from the call gate descriptor to calculate the physical address of the called
procedure as shown in Fig. 430.
Selector
Opcode Offset
[ee [ree []
Descriptor Table
Gate
Descriptor
segment
T pst T
Code) ramon
‘Segment
Descriptor >
Ye
Fig. 4.30 Indirect transfer via call gateMicroprocessors and Microcontrollers 4-34 Protected Mode
Call Gates
= Are defined like segment descriptors
= Do not define any memory space
= Occupy a slot in the descriptor tables
= Provide the only means to alter the current privilege level
= Define entry points to other privilege levels
= Must be invoked with a CALL instruction
During this process the validity of control transfer is checked using four different
privilege levels
1. The CPL (Current Privilege Level)
2. The RPL (Requester’s Privilege Level) of the selector used to specify the call gate
3. The DPL of the gate descriptor
4. The DPL of the descriptor of the target executable segment
For valid control transfer, the transfer must satisfy the following privilege rules for
CALL instruction as shown in Fig. 4.31 (a).
Target DPL < Max (RPL, CPL) < Gate DPL
For example, if you are running in a PL2 code segment (CPL=2), and you want to call
a PLO procedure (target DPL=0), you must use a gate to that procedure with a DPL of 2
or 3, Fig. 4.31 (b) shows some valid accesses to higher privileged levels.
16 - Bit Visible
‘Selector Invisible Descriptor
Target Selector
tt }~
[ Index RPL Privilege
Check
Gate Selector By
cPu
[ome [om | com]
Executable
‘Segment
Descriptor
CPL - Current Privilege Level
RPL - Requestors Privilege Level
DPL - Descriptor Privilege Level
Fig. 4.31 (2) Privilege check via call gateMicroprocessors and Microcontrollers 4-35 Protected Mode
Privilege requirements te use a call gate :
= Call gate DPL must be numerically greater than or equal to the current privilege
level
= Call gate DPL must be numerically greater than or equal to the RPL of the gate
selector i
= Call gate DPL must be numerically greater than or equal to the target code
segment DPL
= Target code segment DPL must be numerically less than or equal to the current
privilege level.
Changing Privilege Levels and Changing Stacks
PL
(Pisiege PL PL PL
Fig. 4.31 (b) Some valid acc
es to higher privileged levels using call gates,
In call gates, the procedure is accessed indirectly, through the call gate descriptor,
rather than directly through a segment descriptor. This indirect access has two major
advantages.Microprocessors and Microcontrollers 4-36 Protected Mode
1. This approach permits another level of privilege checking before access to the
procedure in the higher privileged segment. The privilege level of the calling
program (CPL) is compared with DPL of the call gate. If the privilege level of the
calling program (CPL) is numerically greater than the DPL of call gate, the access
will not be allowed.
2. User programs cannot accidently enter higher privileged segments at just any old
point. If they are going to enter at all, they must enter at the specific offset
contained in the call gate descriptors.
4.9.4 Changing Stacks
The change in privilege level changes the address domain of the program. The
Pentium processor also changes stacks in case of change in privilege level. When call gate
causes a change in privilege, stack segment and pointer ate saved, and a new stack is used
that corresponds to the new, inner privilege level. When controls returned to outer level
code, the use of the original stack is restored. If there is a valid call through gate, the
Pentium processor uses a new stack. It takes segment selector and the pointer for this
stack from the TSS (Note : TSS is discussed in section 5.3). If user is calling procedure with
privilege level 1 (PL1), the new stack selector and stack pointer are taken from SS1 and
ESP1, respectively.
The old stack selector and stack pointer are immediately pushed onto this new stack.
Then Pentium processor finds the number of double word (32-bit) entries to be pushed
from old stack to new stack from WC (Word Count) field from the call gate descriptor.
This means that WC field decides number of passing parameters to the new stack. After
this, old CS selector and EIP offset are pushed onto the new stack. Finally, CS is loaded
from the selector field of the call gate descriptor, EIP is loaded from the offset field, and
execution starts at the new address.
4.9.5 Page Level Protection
Page level protection involves two kinds of protections
1. Restriction of addressable domain _2. Type checking
The U/S and R/W fields of PDEs and PTEs are used to control access to pages.
4.9.5.1 Restricting Addressable Domain
The U/S bit is 0 for the operating system and other system software and related data.
It is a supervisor level. When the Pentium processor is executing at supervisor level, all
pages are addressable. If U/S bit is 1, Pentium processor is executing at user level. In this
case, only pages that belongs to the user level are addressable.Microprocessors and Microcontrollers _ 4-37 Protected Mode
4.9.5.2 Type Checking
At the level of page addressing two types of accessing are defined.
1. Read only access ( R/W = 1)
2. Read/write access ( R/W = 0)
When Pentium processor is executing at supervisor level, all pages are assigned with
Read/write access, whereas at user level page access depends on R/W bit in the PDE and
PTE fields. If R/W bit is 1 pages are only readable and if R/W bit is 0 pages are both
readable and writeable. When Pentium processor is executing at user level, it cannot access
page belongs to supervisor level.
4.10 Privileged Instructions
There are 19 privileged instructions supported by Pentium processor. Privileged
instructions are those that affect the segmentation and protection mechanism, alter the
interrupt flag, or perform peripheral I/O. These instructions are divided in two groups.
1. Privileged Instructions (Group 1)
2. IOPL - Sensitive Instructions (Group I)
4.10.1 Privileged Instru
The instructions that affect the system data structures are come under first group. The
instructions under this group must be executed when CPL is 0; otherwise Pentium
processor generates general protection exception. Table 4.5 shows the instructions from
group I (Privileged Instructions).
ns
Instruction Action
HUT Halts the processor
cLTs: Clears task-switched flag
LGDT, LIDT, LLOT Loads GDT, IDT, LOT registers
uR Loads task register
iwsw Loads machine status word
MOV CRn, REG/MOV REG, CRn Moves to/from control registers
MOV DRn, REGIMOV REG, DRn Moves to/from debug registers.
MOV TRn, REGMOV REG, TRn Moves to/from test registers
Table 4.5 Privileged instructionsMicroprocessors and Microcontrollers 4-38 Protected Mode
4.10.2 IOPL Sensitive Instructions
Here, the IOPL field in the FLAG register defines the right to use I/O related
instructions. Hence the instructions from this group are called sensitive instructions.
Table 4.6 shows the IOPL sensitive instructions.
Disabies interrupts
stl Enables interrupts
IN, INS Inputs data from 1] port
our, ouTs Outputs data to VO port
Table 4.6 IPL - sensitive instructions
In order to execute these instructions, the CPL of a procedure or task must be the
same or a lower number than the number represented by the IOPL bits (CPL < IOPL).
4.11 Special Protection Mode Instructions
SGOT ‘Store Global Descriptor Table
SiOT ‘Store Interrupt Descriptor Table
STR Store Task Register
SLOT ‘Store Local Descriptor Table
Got Load Global Descriptor Table
or Load Interrupt Descriptor Table
ur Load Task Register
LLoT Load Local Descriptor Table
ARPL Adjust Requested Privilege Level
aR Load Access Rights
LsL, Load Segment Limit
VERRIVERW Verity Segment for Reading or Writing
usw Long Machine Status Word (ower 16 bits of CRO)
susw Store Machine Status Word
4.12 Demand Paging
Paging hardware of Pentium processor has three major capabilities
= Address translation
= Page - level protection
= Demand pagingMicroprocessors and Microcontrollers _ 4-39 Protected Mode
In the last section we have seen address translation mechanism by which logical
address is converted into physical address when paging is enabled and we have also seen
the page level protection. In this section, we are going to see demand paging.
Demand paging allows system to create a virtual environment for their programs.
Neither the program code nor the programmer writing it needs to know how much RAM
is really available in the system or where it is located. If a program makes reference
which is not in the main memory, the Pentium processor will call a page fault handler.
Using this page default handler routine, it then retrieves the desired data from secondary
storage (such as a disk) and places it in memory. The previous contents of memory are
swapped with data from the disk. In this way, it is possible to create an impression of a
system with huge amount of main memory. Its actual size and its location are never
known to the program or the programmer, but everything runs as desired.
4.13 Moving to Protected Mode
The Pentium processor begins execution in real mode immediately after RESET signal.
To enter into the protected mode, it is essential to maintain system tables such as Global
Descriptor Table (GDT) and Interrupt Descriptor Table (IDT) and Local Descriptor Table
(LDT). To enter into the protected mode one must have atleast one GDT and an IDT
defined in the system. The IDT must be atleast 256 bytes long and the GDT must contains
atleast one code and the data segment. To enter into protected mode it is necessary to load
CRO with PE bit 0 SET using, MOV instruction to CRO. The PE bit can also be set by
LMSW instruction which maintains the 80286 compatibility. After enabling protected mode,
the next instruction should be an intersegment JMP to reload the CS selector which will
point to a valid code segment selectors initialized to a same value. The following steps
accomplish the switch from the real mode to the protected mode.
= Prepare GDT with a null descriptor in the first GDT entry, one code segment
descriptor one stack segment descriptor and one data segment descriptor.
= Initialize the interrupt descriptor table so that it contains valid interrupt gates for at
least the first 32 interrupt type numbers. The IDT may contain up to 256 8-byte
interrupt gates defining all 256 interrupt types.
= Load the base address and limit of the GDT to GDTR register, using ‘LGDT’
instruction.
= Set PE flag in CRO register, using "MOV CRO" or "LMSW" instruction (for
compatibility with Intel 286)
= Immediately, execute an intersegment (far) jump to load the CS register and flush
the instruction decode queve.
= Load all the data segment registers with the initial selector values.
The Fig. 432 (a) shows the tables needed and Fig. 4.32 (b) shows the descriptors
needed for a simple protected mode Pentium processor system. The simple protected modeMicroprocessors and Microcontrollers __4~40 Protected Mode
Pentium processor system has a single code and single data/stack segment each 4 G bytes
long and a single privilege level PL = 0.
3t oO
seen
"
Initialization: PRFFFFFOH
oe
User memory
Data descriptor 0000 0118H
0000 0110H
0000 0108H
‘Null selector
cog
Interry 4
Base address NXT desire (32) Pr
0000 0000
Fig. 4.32 (a) Simple protected system
Data Segment base 15.....0 ‘Segment limit 18...
descriptor | ora FFEFH
Base 31...24|6 tee Bat
oor 1 ‘4 OH
Code Segment base 15... 0 Segmentlimit 15.....0
descriptor | ont FFEFH
FH OOH
[re [oer
Fig. 4.32 (b) GDT descriptor for simple systemMicroprocessors and Microcontrollers 4-41 Protected Mode
An alternative approach to entering protected mode which is especially appropriate for
multi-tasking operting systems, is to use the built-in task switch to load all of the registers.
In this case the GDT should contain two TSS descriptors in addition to the code and data
descriptors needed for the first task. The first JMP instruction in protected mode should
jump to the TSS causing the task switch and loading all of the registers with the values
stored in the TSS. The TSS register should be initialized to point a valid TSS descriptor
since a task switch saves the state of the current task in a task state segment.
The steps required for entering protected mode using alternative approach are as
follows :
= Initialize the interrupt descriptor table so that it contains valid interrupt gates for at
least the first 32 interrupt type numbers. The IDT may contain up to 256 8-byte
interrrupt gates defining all 256 interrupt types.
«Initialize the global descriptor table so that it contains at least two task state
segment (TSS) descriptor, and the initial code and data segments required for the
initial task.
"Initialize the task register (TR) so that it ponits to a valid TSS descriptor since a
task switch saves the state of the current task in a task state segment.
"= Switch to protected mode by using an intersegment JMP to load the CS register
and flush the instruction decoder queue.
The first JMP intruction in protected mode would jump to the TSS casuing a task
switch and loading all the registers with the values stored in the TSS.
4.14 Switching Back to Real Address Mode
It is possible to enter into Real Mode from Protected Mode by resetting the PE bit of
the CRO register, with MOV CRO, (Reg. or Mem) instruction. But before returning to the
real mode one must check that all the values used by the processor should be legal Real
Mode values. It is suggested to use the following sequence of operations for returning to
the Real Mode.
1. If paging is enabled do the following operations
a. Transfer control to linear addresses that have an identity mapping. This means
that transfer the control to the addresses where linear addresses are equal to
physical addresses.
.b. Clear the PG (Paging) bit in CRO.
c. Load zeroes to CR3 to clear out the paging cache.
2. Transfer control to a segment that has a limit of 64K (FFFFH). This ensures that
the contents of CS register are within the limit of 64K, which is required in real
mode.
3. Load segment registers SS, DS, ES, FS, and GS with a selector that points to a
descriptor containing the values given in the following Table 4.7.Microprocessors and Microcontrollers 4-42 Protected Mode
Descriptor Value
Base Any
Limit 64K (FFFFH)
Present P
Writeable w
Expand up E=
Byte granular G=0
Table 47
4. Disable interrupts with instruction clear interrupts (CLI). A CLI instruction disables
INTR interrupts. NMIs can be disabled using external circuitry.
5. Clear PE bit.
6. Flush the instruction queue by executing a Far JMP to the real mode code. This
also puts the appropriate values in the access rights of the CS register.
7. Load the base and limit of the real mode interrupt vector table/interrupt descriptor
table (IDT) using LIDT instruction.
8 Enable interrupts.
9. Load the segment registers as required by the real mode code.
15 Virtual Memory
In most modern computers, the physical main memory is not as large as the address
space spanned by an address issued by the processor. Here, the virtual memory technique
is used to extend the apparent size of the physical memory. It uses secondary storage such
as disks, to extend the apparent size of the physical memory. Let us see how this
technique works
When a program does not completely fit into the main memory, it is divided into
segments. The segments which are currently being executed are kept in the main memory
and remaining segments are stored in the secondary storage devices, such as a magnetic
disk. If an executing program needs a segment which is not currently in the main memory,
the required segment is copied from the secondary storage device. When a new segment of
a program is to be copied into a main memory, it must replace another segment already in
the memory. In modem computers,: the operating system moves program and data
automatically between the main memory and secondary storage. Techniques that
automatically swaps program and data blocks between main memory and secondary
storage device are called virtual memory. The addresses that processor issues to access
either instruction or data are called virtual or logical address. These addresses are
translated into physical addresses by a combination of hardware and software components.
If a virtual address refers to a part of the program or data space that is currently in the
main memory, then the contents of the appropriate location in the main memory areMicroprocessors and Microcontrollers __4-43 Protected Mode
accessed immediately. On the other hand, if the referenced address is not in the main
memory, its contents must be brought into a suitable location in the main memory before
they can be used. :
We have seen that, how virtual memory removes the programming burdens of a small,
limited amount of main memory. Along with this it also allows efficient and safe sharing
of memory among multiple programs. Consider a number of programs running at once on
a computer. The total memory required by all the programs may be much larger than the
amount of main memory available on the computer, but only a fraction of this memory is
actively being used at any point in time. The main memory need to contain only the active
portions of the many programs. This allows use to efficiently share the processor as well
as the main memory.
Fig. 433 shows a typical memory organisation that implements virtual memory. The
memory management unit controls this virtual memory system. It translates virtual address
into physical addresses. A simple method for translating virtual addresses into physical
addresses is to assume that all programs and data are composed of fixed length unit called
pages, as shown in the Fig. 4.34. Pages constitutes the basic unit of information that is
moved between the main memory and the disk whenever the page translation mechanism
determines that a swaping is required.
Virtual address
Page 1,048,495 4KB
Page 1,048,494 | |4KB
Physical
address
Data
Physical address space
4KB
Data Physical address
4kB
Main Memory
4KB
DMA transfer
Fig. 4.33 Virtual memory Fig. 4.34 Paged organisation of the physical
‘organisation address spaceMicroprocessors and Microcontrollers 4-44 Protected Mode
4.15.1 Address Translation
In virtual memory, the address is broken into a virtual page number and a page offset.
Fig. 435 shows the translation of the virtual page number to a physical page number. The
physical page number constitutes the upper portion of the physical address, while the
page offset, which is not changed, constitutes the lower portion. The number of bits in the
page offset field decides the page size.
Virtual address
from processor
Page table base register ———/"——~
Page table address Virtual page number | Ofiset_]
Fw
Control Page number
bits inmemory
Physical address.
in main memory
Fig. 4.35 Virtual to physical address translation
The page table is used keep the information about the main memory location of each
page. This information includes the main memory address where the page is stored and
the current status of the page. To obtain the address of the corresponding entry in the
page table the virtual page number is added with the contents of page table base register,
in which the starting address of the page table is stored. The entry in the page table givesMicroprocessors and Microcontrollers 4-45 Protected Mode
the physical page number, in which offset is added to get the physical address of the main
memory.
If the page required by the processor is not in the main memory, the page fault occurs
and the required page is loaded into the main memory from the secondary storage
memory. by special routine called page fault routine. This technique of getting the desired
page in the main memory is called demand paging.
To support demand paging and virtual memory processor has to access page table
which is kept in the main memory. To avoid the access time and degradation of
performance, a small portion of the page table is accommodated in the memory
management unit. This portion is called translation lookaside buffer (TLB) and it is used to
hold the page table entries that corresponds to the most recently accessed pages. When
processor finds the page table entries in the TLB it does not have to access page table and
saves substantial access time.
Review Questions
Draw the programmer's model of Pentium processor in protected mode.
What is segmentation ?
Explain the necessity of protection in Pentium processor.
Assume (DS) = 0204H, (ESI) = 00002000H, paging is disabled and mode is protected
mode.
a. From which of the three descriptor table (IDT, LDT, GDT) the descriptor will be
considered ? Give the descriptor entry number
Aw
b, Assume appropriate values in the descriptor selected and explain how the address
translation takes place when the following instruction is executed.
MOV AX, [SI]
5. Explain the function of TI and RPL bit
6. State how the granularity bit affects the limit field.
7. Explain the meaning and usage of ‘Expand down’ segments. How are the base and limit fields
interpreted for these segments ?
8. What are the various fields in page directory entry and poge table entry ? What are their
uses ?
9. Explain the functions of RPL, CPL and DPL.
10. What is the purpose of TLB and descriptor cache ? How do they reduce system
overheads?
11. Explain with an example how logical address is converted with respect to PDE, PTE (Page
Table Entry), Page frame, GDTILDT. Assume suitable data and state the volues you have
chosen.
12, What is the meaning of privileged instructions in Pentium processor ? State whether or not
PUSH and POP are privileged instructions.Microprocessors and Microcontrollers 4-46 Protected Mode
13.
14.
15,
16.
17.
18.
19.
21.
22.
24,
25,
26.
27.
28.
State the privilege rules for
1. Accessing data in code segment, 2, Control transfer
Discuss the mechanism by which Pentium processor user operating at PL3 or PL2 can call
procedures at high privileged level through CALL gates. Outline clearly the checks made by
Pentium processor.
Why does Pentium processor support different stacks when it changes privilege level during
CALLs ? Whaf parameters are saved ? Give details.
What is the function of NT bit ?
What is a difference between conforming code segment and non-conforming code segment ?
How many global description can be stored in GDT ? Justify. .
How does the selector choose the local descriptor table and Pentium processor access it ?
Explain in detail seament level protection using privilege mechanism.
What is the purpose of Word Count bit in call gate descriptor ?
What do you mean by segment descriptor cache ? Explain the use of it.
Explain the purpose, structure and locations of various descriptor tables used in Pentium
processor.
What are privileged and sensitive instructions ?
Explain the page level protection mechanism in Pentium processor.
Write down the procedure to enter in protected mode from real mode.
Write down the steps to switch back to real mode from protected mode.
Write a short note on virtual memory.
aoaMultitasking
5.1 Introduction
Microcomputer systems are shared by several computer operators, or users. Each user
commonly has a terminal that is connected to the computer and he is allowed to do his
‘own work. For example, one user in the personal department may calculate company
payroll, another in finance may do financial estimates, and another in engineering may use
the computer to do CAD work. All these users are using the same computer at the same
time. Now the question is how one computer serve so many users ? The answer is a
digital computer runs at extremely high speed and thus it shares. this time among more
than one user at a time, Actually computer is serving only one user for a short time and
then moving on to the next user, and so on. Each user is allotted a time slice. This time
slice is a small fraction of second. The computer in rotation allocates this time to all users.
This technique is known as time sharing and this is how nearly all computers support
multiple user operations.
Time Sharing
= Allows multiple users to use the same computer
= Provides economical use of processing resources
= Is invisible to the users
= Can work for any number of users
‘An operating system which coordinates the actions of a time-share system such as this
is referred to as a multi-user operating system. The program or section of program for
each user is referred to as a task, so a multi-user operating system implies multitasking.
But reverse is not necessarily true.
Multi-user Versus Multitasking
= Multi-user = Many different people using one computer.
= Multitasking = Many different tasks on one computer.
= A Multi-user computer can perform many tasks for many users.
= A multitasking computer can perform many tasks for one user.
(5-1)Microprocessors and Microcontrollers 5-2 Multitasking
5.2 Scheduling Methods for Muli
There are different approaches for implementing multi-user operating system.
user Operating System
1. Time-slice scheduling
2. Pre-emptive Priority Based scheduling.
5.2.1 Time-Slice Scheduling
There is a specific component of the operating system which determines when it is
time to switch from one task to another is called the scheduler, dispatcher or supervisor.
In the previous discussion we have seen time slice method in which CPU executes one
task for small period of time ( fraction of second ) then switches to the next task. After
executing all tasks in a sequence, CPU returns to the first task. The advantage of the
time-slice approach in a multi-user system is that all users are serviced at approximately
equal time intervals. If number are more, time slices which each user gets, are less. Thus
each user's program takes more time to execute. This is referred to as system degradation.
Due to this, system having more number of users prefer pre-emptive priority-based
scheduling approach.
5.2.2 Pre-emptive - Priority Based Scheduling
In this system, each task has given a priority number and higher priority tasks are
allowed to interrupt lower priority tasks. This means that when lower priority task is in
execution, higher priority task can take control and after completion of higher priority task,
it returns the control to the lower priority task. This approach is suitable for most
applications, because it allows the most important tasks to be done first.
5.2.3 Context Switching
Each task uses register, data pointers, memory pointers, memory variables stack area,
etc. This is referred to as environment or context of that task. When a task switch occurs,
the environment or context of the interrupted task must be saved so that the task can be
continued properly when it receives another time-slice. This environment and pointer to
environment is usually stored in special memory segment or on a stack. When it is
necessary to switch back to the task again, the operating system uses the pointer to access
the saved environment. This process is known as context switching.
5.3 Support Registers and Related Descriptors for Multitasking
The Pentium processor has special registers, special descriptors to support efficient and
protected multitasking system. These are :
= Task State Segment (TSS)
= Task State Segment Descriptor
= Task Register
= Task Gate DescriptorMicroprocessors and Microcontrollers 5:3. Multitasking
With these registers and data structures the Pentium processor switches execution from
one task to another task, saving the environment of the current task. Thus task can be
continued later.
Apart from simple task switch, the Pentium processor supports two other task
management features
= Interrupts and exceptions can cause task switches, The Pentium not only switches
automatically to the task that handles the interrupt or exception, but it
automatically switches back to the interrupted task when the interrupt or exception
has been serviced. Interrupt task may interrupt lower-priority interrupt tasks to any
depth.
= As each task can have separate LDT and page directory, it can have a different
* logical to linear mapping and a different Jinear-to-physical mapping. Due to this
tasks can be isolated and prevented from interfering with one another.
5.3.1 Task State Segment (TSS)
Fig. 5.1 shows the format of a TSS. It is a special type of segment, used to manage the
task. The Pentium processor uses TSS like a scratch-pad. It stores everything it needs to
know about a task in TSS. This means that task environment (context) is stored in the TSS.
TSS is not accessible to the general user program or program even at privilege level 0.
The fields within TSS are accessible to only Pentium. The fields of a TSS are divided into
two sets : Dynamic set and static set.
1, Dynami Set :
The Pentium processor updates dynamic set when it switches from one task to another
task. This set includes :
= The general registers (EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI )
= The segment register (ES, $8, DS, ES, FS, GS)
= The flag registers (EFLAGS)
= The instruction pointer (EIP)
= Back link
The first four fields (general registers, segment registers/selectors, flags and instruction
pointer) save the state of the microprocessor, Pentium processor. Saving EIP guarantees
that the task can be restarted at the point at which it was stopped and saving EFLAGs
allows Pentium processor to execute conditional instructions properly, when the task is
restarted.
The Back Link is used by the Pentium processor to keep track of a previous task. By
executing a return instruction at the end of the new task, the back link selector for the
previous TSS is automatically loaded into task Register. This activates the previous task
and restores the prior program environment.Microprocessors and Microcontrollers 5-4 Multitasking
31 °
BitMap Offset [ooooao000000000[7] 64
0900000000000000 oT 60
(0900000000000000 5c
‘0900000000000000 FS. 58
(0000090000000000 os
0000000000000000
0000000000000000
0000000000000000
SSSRBSSKEBGSSPtSESSE
0000000000000000
EIP2 “4
(0000000000000000 S81 0
EIP1 oc
0000000000000000
‘0000000000000 Back link
Fig. 5.1 Task state segment
2. Static Set :
The Pentium processor only reads fields from this set. This set includes :
= The selector for the task’s LDT
= The register (PDBR) that contains the base address of the task’s page directory
= Pointers to the stacks for privilege levels 0-2
=) The T-bit (debug trap bit) which causes the Pentium processor to raise a debug
exception when a task switch occurs.
= The 1/O map offset.Microprocessors and Microcontrollers 5-5 Multitasking
Note: TSS static set saves the selector for the task’s LDT. This means that TSS
descriptors must appear only in the GDT.
Task switching may change the privilege level changing the addressable domain of the
program. As rule says the privilege level of the stack segment must exactly match the
privilege level of the code segment at all times, the Pentium processor has to change stack
when there is change in privilege level. Due to this previous stack segment and pointer are
abandoned, and a new stack is used that corresponds to the new privilege level. When
control is returned to previous level, the previous stack is restored. To store stack pointer
and stack selector of the previous task fields ESP0, ESP1, ESP2, SSO, SS1, SS2 hold the
stack segment pointers and stack selectors for privilege levels 0, 1 and 2.
The 1/O map base holds the 16-bit offset of the beginning of the I/O permission bit
map. It is implemented on a task-by task basis and affects the hardware privilege checking
only for I/O instructions. Privilege checking mechanism for I/O is described in the
previous section of this chapter.
5.3.2 TSS Descriptor
Like other segments, the task state segment is defined by descriptor called TSS
descriptor. Fig. 5.2 shows the task state segment descriptor. It contains fields like other
segments. The B-bit in the type field indicates whether the task is busy. Tasks are not
re-entrant, The B-bit allows Pentium processor to detect an attempt to switch to a task that
is already busy. The BASE, LIMIT, and DPL fields and the G-bit and the P-bit have
functions similar to other descriptors. The limit field, however must have a value equal to
or greater than 103 (104-1), because Pentium processor requires minimum 104 bytes of
storage in order to perform a context save. A larger limit is permissible and it is required
if an 1/O permission map is present. The maximum limit for TSS is 4GByte.
Segment Base 15 ... Segment Limit 15... o
Fig. 5.2 Task state segment descriptor
To access TSS descriptor; the procedure must have privilege level less than or equal to
(numerically) privilege level specified by DPL field of the TSS descriptor. Usually this
access is restricted for only trusted softwares, whose privilege level is zero. This can be
done by setting DPL fields of TSS descriptor to zero. Thus only trusted softwares has the
right to perform task switching.Microprocessors and Microcontrollers 5-6 Multitasking
5.3.3 Task Register (TR)
The Task Register (TR) specifies the currently executing task by pointing to the TSS.
Fig. 53 shows the path by which Pentium processor accesses the current task. Task
Register is a selector for the TSS.
Task State
‘Segment
16 - Bit Visible
Register Hidden Register
Global Descriptor Table
TR
Fig. 5.3 Task register
It has both visible portion which can be read and changed by instructions and invisible
portion (maintained by the Pentium processor to correspond to the visible portion which
can not be read by any instruction). The selector in the visible portion is used to specify a
TSS descriptor in the GDT and invisible portion is used to cache the base and limit values
from the TSS descriptor. Holding the base and limit in the invisible portion of the Task
Register makes execution of the the task more efficient, because the processor does not
need to repeatedly fetch these values from memory when it references the TSS of the
current task.
The Pentium processor gives two instructions to read and modify the visible portion of
the task : LTR (Load Task Register) and STR (Store Task Register).
LTR (Load Task Register)
It loads the visible portion of the task register with the selector and invisible portion
with information from the TSS descriptor selected by selector. LTR is a privileged
instruction. Thus it is executed only when CPL is zero,Microprocessors and Microcontrollers ___5-7 Multitasking
STR (Store Task Register) :
It stores the visible portion of the task register in a general register or memory word.
STR is not a privilege instruction.
5.3.4 Task Gates and Task Gate Descriptor
Task gates, like call gates, are special system gates. It has its own descriptor. A task
gate descriptor does not define a memory segment but instead acts as an interface point
between user code and a task state segment. It provides an indirect and protected
reference to a TSS. Fig, 5.4 shows the format of a task gate descriptor. A task gate
descriptor defines a selector to a TSS descriptor which uniquely identifies a task. Like the
selector to a call gate, the selector to a task gate can be used in place of a selector to a
code segment in FAR JMP and FAR CALL instructions
Fig. 5.4 Task gate descriptor
‘As mentioned earlier, the DPL field of a task gate controls the right to use the
descriptor to cause a task switch. Procedure selects a task gate descriptor only when the
maximum of selector’s RPL and the CPL of the procedure is numerically less than or equal
to the DPL of the descriptor.
MAX (CPL, RPL) < task gate DPL
Now if DPL privilege level is 0. Then privilege constraint prevents untrusted
procedures (procedures having privilege level from 1 to 3) from causing task switch. But
through task gates we can switch from lower privilege to higher privilege because when a
task gate is used, the DPL of the target TSS descriptor is not used for privilege checking.
Thus a procedure that has access to a task gate has the power to cause a task switch.
5.4 Task Switching
It is important to note that after every task switch i.e. after loading a new context from
a TSS and updating TR, the Pentium processor marks the new TSS as “busy”. It does this
by setting bit 41 in the currently running TSS descriptor, Therefore currently running task
is always a busy task, The Pentium processor cannot do task switch into a task which is
busy, Tasks are not reentrant, and task switches therefore cannot be recursive.Microprocessors and Microcontrollers 5-8 Multitasking
The Pentium processor does task switching in any of four cases :
1
A long jump or call instruction contains a selector which refers to a TSS descriptor.
This is the simplest method and can be easily implemented by the operating
system kernel at the end of a time slice.
2. The selector in a long jump or call instruction refers to a task gate. In this case the
selector for the destination TSS is in the task gate. This indirect method has
advantages regarding privilege levels and protection.
3. The interrupt selector refers to a task gate in the interrupt descriptor table. The
task gate contains the selector for the new TSS. If the access passes all the privilege
level tests, the selector and descriptor for the interrupt task will be loaded into the
task register. The nested task (NT) bit in the EFLAGs register will be set.
4. An IRET instruction is executed with the NT bit in the EFLAGs register set. The
IRET instruction uses the back link selector in the TSS to return execution to the
interrupted task.
1 Task Switching Without Task Gate
Fig. 5.5 shows task switch operation.
EFLAGS
oR3
oetss Co
‘Task - Swtching
Instruction
cor
Fig. 5.5 Task switch operationMicroprocessors and Microcontrollers 5-9 Multitasking
Steps Involved In Task Switching (Without Task Gate)
1.
Privilege Check : The current task is checked to see whether it is allowed to
switch to the designated task. This is done by checking DPL of the designated TSS
with RPL and CPL of the current task. If the DPL of the TSS descriptor is
numerically greater than or equal to the maximum of CPL and the RPL of the
selector then only the current task is allowed to switch to the designated task.
Limit and Present Bit Checking : The TSS descriptor for the’ designated task is
checked for its limits and presence.
Saving the State of the Current Task : The Pentium processor finds the base
address of the current TSS cached in the task register. It copies the registers into
the current TSS (EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, ES, CS, DS, SS, FS, GS,
the flag register and EIP). The EIP field of the TSS points to the instruction after
the one that caused the task switch. The selector for the current task is saved as a
back link selector in the new task.
Loading of Task Register : The visible portion of the task register is loaded with
the selector of the designated task’s TSS descriptor. This sets the TS (Task switch)
bit in the Machine Status Word (MSW). This TS bit is useful to systems software
when a coprocessor is present. The TS bit signals that the context of the
coprocessor may not correspond to the current Pentium processor task. The B Bit
in the new task’s descriptor is marked busy. Then the corresponding, task state
descriptor is read from the GDT and loaded into the task register cache (hidden
portion of task register).
Resuming Execution : Finally, Pentium processor starts execution of designated
task, with the instruction pointed by the new contents of the code segment selector
(CS) and instruction pointer (EIP).
The old program environment is preserved by saving the selector for the old TSS as
the back link selector in the new TSS. By executing a return instruction at the end of the
new task, the back link selector for the old TSS is automatically reloaded into TR and then
program execution resumes at the point where it left off in the old task.
5.4.2 Task Switching with Task Gate
In this, the indirect method is used for task switching. Task switching is done by
jumping to or calling a task gate. Fig. 5.6 shows task switching through a task gate.
Steps Involved in Task Switching (Using Task Gate)
1
Privilege Check : When task gate is used, the DPL of the new TSS descriptor is
not used for privilege checking. The DPL of the task gate is compared with the
CPL and RPL of the gate selector. If the DPL of the task gate is numerically
greater than or equal to the maximum of CPL and the RPL of the gate selector, the
current task is allowed to switch to the designated/new task. The remaining steps
are similar excepts that for loading selector for TSS descriptor into TR task gate is
referred instead of CALL or JMP instruction.Microprocessors and Microcontrollers 5-10 Multitasking
In case of exceptions, interrupts and IRETs regardless of the DPL of the new task gate
or TSS descriptor, the current task is allowed to switch to the new task.
Local Descriptor Table Interrupt Descriptor Table
Tesk Gate i i Task Gate
Global Descriptor Table
Task Descriptor
Task State
‘Segement
Fig. 5.6 Task switching through task gate
5.4.3 Nested Tasks
Nested tasks are analogous to nested subroutines. If task switch was caused by a FAR
CALL instruction or by an exception, fault or trap, the new task is considered to be nested
within the old task that invoked it. In any of these cases, when the task executes an IRET
instruction, the Pentium processor automatically task-switches back to the task that
invoked it. To do so, there is a mechanism of linking the tasks, which is equivalent of a
call/return stack. The task linking mechanism consists of Back Link and NT (Nested Task)
flag. The Back Link is used to keep a track of a previous task. By executing a IRET
instruction at the end of the new task, the back link selector for the previous TSS is
automatically loaded into task register. This activates the previous task and restores the
prior program environment. The Pentium processor sets the NT (Nested Task) flag in theMicroprocessors and Microcontrollers 5-11 Multitasking
EFLAGS register, when one system task invokes another task. The Pentium processor uses
NT as a flag so that it can tell whether the Back Link field in the current TSS is valid
Should it encounter an IRET instruction ? This is the only means by which Pentium
processor determines whether it should perform a task switch or a normal IRET.
RET instruction does not ‘unnest’ tasks, even if they were nested by CALL
instructions. Only IRET can ‘unnest’ tasks.
Nested Task Switches
= Nested tasks act like subroutines.
= CALL instruction to task gate will nest tasks.
«Interrupt or exception to task gate will nest tasks.
= JMP instruction will not nest tasks.
= New TSS gets old TSS selector in Back Link field.
= New task gets nested task bit set in EFLAGS register.
= New task must return to old task with IRET instruction,
5.5 1/0 Protection
The Pentium processor supports two mechanisms for protecting 1/O ports in protected
mode
1. The IOPL field in the EFLAG register defines the right to use I/O related
instructions (I/O privilege level).
2. The I/O permission bit map of a Pentium processor TSS segment defines the right
to use ports in the I/O address space.
5.5.1 1/0 Privilege Level
In this mechanism, for execution of IN, INS, OUT, OUTS, CLI and STI instructions, the
CPL of a procedure or task must be the same or a lower number than the number
represented by the IOPL bits. (CPL 1/0 Bit-Map
offset
BitMap- offset | 00000000 0000 000
‘0000000000000
‘0000000000000
'0000000000000000 Tss Backtine |,
Fig. 5.7 UO address bit map
The I/O permission bit map is a bit vector. The size of the map and its location in the
TSS (Task State Segment) are variable. The Pentium processor locates the I/O permission
map. By means of the I/O map base field which is in the fixed portion of the TSS. Each
bit in the map corresponds to an I/O port byte address. Thus 16-bit ports use 2-bits each
and 32-bit ports use 4-bit each. To access I/O port the corresponding bit in the I/O bitmap
must be 0.
When program attempts to access a port, the Pentium processor first compares the
CPL of the task with the IOPL. If the access passes the IOPL test and an 1/O bit map is
compulsory, the Pentium processor checks the map bit corresponding to the addressed
port. If corresponding bit is 0, access is granted.-Microprocessors and Microcontrollers 5-13 Multitasking
dumb Example 5.1 : An Pentium processor system has 256 1/O ports with addresses from OOH
to FFH. All these ports except 21H to 2FH are to be made accessible to a user at PL3. Show
how the I/O permission bit map look like?
Solution : i) I/O permission bit map :
FR} Oo; o};o}o|o}o;/o}o}o}ojo}o}o}o]o]o]ro
EF/o/o}o}oj}o}ojo}o}oj/o}]o}o}o}o]o}o]eo
oF} o}ofo}ol/o}afojojo}o]o}o]o}o]o]o|}oo
CF}o}o;/o}o};o}ofo}ojo}o]o}o}]o}o}o]o}co
BF/O}0}/0/0;)0}/0};0;/0}0}0;0]0}o0] 0) 04} 0} BO
AF} o}oj/o}o]jo}ojo}ojo}ojo}o}]o}o0]0] 0} a0
ew }o}lo}ofo}ojo}ofo}jo}o}o]o}jojo}o}o}so
a}ojo}lojo}ojo}o}jojo}o]ojfojojo}o}o}eo
wlo}lo}ojo}o}jo}ojojo}afojo 0 | 0 | 0 | 70
eF{/o}o}o}fojojfolojololajojoa o | 0} 0 | 60
sF{o}o}o}olofojofo}o{a}]o]o]ojojo}o|}so
4F;/o}o}o};o}o}o}o}/o}oja}ojo}ojojo}o}4o
3F}O}o0;/o};o0}o0;/o}o0;0}o0};0}0]o0]; 0] 0} 0 | oO] 30
awlafatrtatatafada
wrFlo;olo}lo}o}lo}ojojojoalo}ojo}o
e}e
oO /ojo}ofo}lofjojofojojao}ojo}jojo
Review Questions
1. Write whether multitasking and multi-user systems are same, justify your answer.
2. Explain the methods by which task switch is forced.
3. What is context switching ?
4, Explain the Task Gate Descriptor.Microprocessors and Microcontrollers 5-14 Multitasking
5. What is the purpose of Task Register ?
6. What is a Task State Seament (TSS) ? Give the format of TSS descriptor. How does it differ
from gate descriptor ?
7. Explain how Pentium processor carries out task switching using on-chip dota structures and
various registers,
8. What are the various dota structures anc! registers that support multitasking in Pentium
processors ?
9. Write a short note on nested tasks.
goaVirtual Mode
6.1 Introduction
In multitasking system, it is necessary to switch back and forth between real and
protected mode. Because in multitasking system, there is a mixture of tasks, some use
segment-offset addressing (Real mode addressing) and some use descriptors (protected
mode addressing). The 8086 virtual mode solves this problem. A Pentium operating in
protected mode can easily switch to virtual 8086 mode to execute a time slice of an 8086
program and then easily switch back to protected mode to execute a time slice of protected
mode task. The Pentium allows execution of one or more 8086, 8088, 80186 or 80188
programs in an Pentium protected mode environment, as different tasks in the virtual 8086
mode.
In 8086 virtual mode, the Pentium treats the segment registers exactly the same way as
it does in Real Mode. Therefore, the address range of a virtual 8086 mode task is 1Mbyte.
The segment and offset registers together give the linear address instead of physical
address. The physical address is generated from the linear address with the help of page
translation. Thus the -physical address may be anywhere in the 4 gigabyte memory
addressable by the Pentium.
In 8086 virtual mode, the Pentium provides mechanism to. selectively trap.and manage
Input/Output and interrupt activity. Using software it is possible to determine the
Input/Output Privilege Level (IOPL) that selectively controls Input/Output transfer and it
is also possible to use the input/output port permission map to selectively control access
to Input/Output ports. .
This chapter covers the following topics in concern with 8086 virtual mode.
«= Entering and Leaving 8086 Virtual Mode
= Registers and Instructions
= Address calculations in 8086 Virtual Mode
= Paging ‘in virtual 8086 mode
= Protection and 1/O permission bitmap in a virtual 8086
(6-1)Microprocessors and Microcontrollers 6-2 Virtual Mode
6.2 Entering and Leaving 8086 Virtual Mode
The Pentium enters or leaves 8086 virtual mode due to any of the three Teasons as
shown in Fig. 6.1.
Task Switch
Interrupt, Exception V6 Monitor
“eons [mer
IRET (Protected
(V86 Mode) Mode)
(Protected Mode)
Task Switch
Task Switch
Fig. 6.1 Entering and leaving an 8086 program
1. An interrupt that vectors to a task gate
2. An action of the schedule of the Pentium operating system.
3. An IRET when the NT (Nested Task) flag is set.
6.2.1 Entering 8086 Virtual Mode
The Pentium can enter 8086 virtual mode by either of two means :
1. A task switch to ar Pentium task loads the image of EFLAGs from the new TSS. If
the VM bit in EFLAGS register is set, the Pentium enters virtual 8086 mode to
execute the new task. If the VM bit is not set, the Pentium executes the new task
as a normal protected mode task.
Note: If the TSS of the new task is an 80286 TSS, Pentium does not enter into, 8086
virtual mode because the 80286 TSS does not store the high-order word of
EFLAGs, which contains the VM flag.
2. An IRET from a procedure that loads the EFLAGs image changes the VM bit if
the Current Privilege Level (CPL) at the time of IRET is zero. If changed status of
the VM bit is 1 then Pentium enters in 8086 virtual mode.
6.2.2 Leaving 8086 Virtual Mode
The Pentium leaves the 8086 virtual mode when an interrupt or exception occurs.
1. A task switching from a 8086 virtual task to any other task caused by interrupt or
exception loads EFLAGS from the TSS of the new task. If the new TSG is anMicroprocessors and Microcontrollers 63 Virtual Mode
Pentium TSS and the VM bit is zero, or if the TSS is an 80286 TSS, the Pentium
clears the VM bit of EFLAGS. It then loads the segment registers as defined by the
new TSS and begins executing the instructions of the new task according to
Pentium protected mode description.
2. The interrupt or exception which vectors to a privilege-level zero procedure, stores
the current setting of EFLAGS on the stack, then clears the VM bit. As VM bit is
zero, the Pentium starts executing the instructions in its protected mode
environment.
6.3 Registers and Instructions
6.3.1 Registers
Virtual 8086 mode register set includes :
1. All the registers defined for the 8086 plus
2. The new registers introduced by the Pentium : FS, GS, debug registers, test
registers and control registers.
6.3.2 Instructions
In virtual mode, Pentium can execute normal 8086 instructions as well as new
instructions introduced by 80186/80188, 80286 and Pentium as listed below. For execution
of new instructions and new override prefixes use of FS and GS segment registers is
allowed. Instructions can utilize 32-bit operands through the use of the operand size prefix.
1 New instructions introduced by 80186/80188 and 80286
= PUSH immediate data
= PUSH ALL and POP ALL ( PUSH A and POP A )
= Multiply immediate data
Shift and rotate by immediate count
= String 1/0
= ENTER and LEAVE
= BOUND
2._New instructions introduced by Pentium
2 LSS, LFS, LGS instructions
= Long displacement conditional jumps
= Single bit instruction
= Bit scan
= Byte set on condition
= Double shift instructionMicroprocessors and Microcontrollers 6-4 Virtual Mode
= Move with sign/zero extension
= Generalized multiply
Note : To access these instructions only 8086 addressing modes can be used.
6.4 Address Generation in 8086 Virtual Mode
In virtual 8086 mode, the contents of segment registers are not used as a selector to
point the descriptor. But the segment: register contents are used to generate linear address
with the help of offset. The linear address is generated by adding the contents of the
appropriate segment register which are shifted left by 4 bit to an effective address/offset.
Fig. 6.2 shows virtual 8086 mode address generation.
19 3 oO
* 49 15 °
Offset 16 - Bit Effective Address
|
20 19
ui
Address XX XX XX XX XX XK XK XK X KX KX KX
Fig. 6.2 Virtual 8086 mode address gonoration
If there is a carry generated after addition of shifted segment register contents and
effective address, unlike 8086, resulting 21 bit address is a linear address. An Pentium in
virtual 8086 mode is allowed to generate linear addresses anywhere in the range 0 to
10FFEFH (one megabyte plus approximately 64 Kbytes) of the task’s linear address space.
Virtual 8086 tasks generate 32-bit linear addresses. While an 8086 program can only
utilize the lower order 21 bits of a linear address, the linear address can be mapped via
page tables to any 32-bit physical address.
Unlike 8086 and 80286, the Pentium can generate 32-bit effective address with the
address size command prefix. This address should not exceed beyond 65535 to maintain
compatibility with 80286 Real Mode; otherwise Pentium generate pseudo-protection faults
(INT 12 OR INT 13 with no error code).
6.5 Paging in Virtual Mode
Although Protected Mode memory segmentation is used while the Pentium is
operating in Virtual 8086 mode, the paging portion of Pentium does work. The paging
hardware allows the concurrent running of multiple Virtual Mode tasks, and provides
protection and operating system isolation. It is not necessary to have paging hardware
enabled to run Virtual Mode tasks, however paging is useful or necessary for any of the
following reasons :Microprocessors and Microcontrollers 6-5 . Virtual Mode
1. The paging mechanism is needed in order to run multiple virtual mode tasks, as
shown in Fig. 63
vue
Task,
vse
Task,
FFFFF
vas
Task,
Linear Page Physical
Address Tebles Addresses:
Fig. 6.3 Multiple virtual 8086 tasks
2. It is used to relocate the address space of a virtual 8086 mode task to physical
address space greater than one megabyte.
3. The paging mechanism allows the 20-bit linear address produced by a virtual 8086
program to be divided into up to 256 pages. Each one of the pages can be located
anywhere within the maximum 4 Gbyte physical address space of the Pentium.
4, Since CR3 (the page directory base register) is loaded by a task switch, each virtual
8086 mode task can use a different mapping scheme to map pages to different
physical locations.
5. Paging mechanism allows the sharing of the 8086 operating systeni code between
multiple 8086 applications.
6.6 Protection and I/O Permission Bitmap
‘The virtual 8086 programs or tasks do not make any distinction between code space,
data space, and stack space. There are no upper bound and lower bound on segment, the
address is generated using segment and index registers. There is no such thing as aMicroprocessors and Microcontrollers 6-6 Virtual Mode
not-present segment or a privileged segment. All virtual 8036 mode programs execute at
privilege level 3, the level of least privilege. These programs are limited to the first IMB of
the linear address space.
Whenever VM bit is set the processor is operating in virtual 8086 mode and effective
CPL is 3. Thus, an attempt to execute a privileged instructions (instructions to be executed
in privilege level 0) in virtual 8086 mode will cause an exception 13.
We know that the Pentium has several IOPL sensitive instructions. Recall that IOPL is
a 2bit field in EFLAGs that specifies the minimum privilege level required to execute
certain I/O related instructions. Because a VM86 program has a fixed privilege level of 3,
it is never able to alter the IOPL bits and so might be granted or denied I/O permission
when its TSS is first created.
In virtual 8086 mode, both the IOPL field and 1/O permission bit map are used, but
both have very different functions than they do in protected mode. In virtual 8086 mode,
IOPL controls the right to execute the following instructions only.
= CLI- Clear interrupt enable flag.
= STI- Set interrupt enable flag.
= LOCK - Asserts Bus Lock Signal.
= PUSHE - Push flags
= POPE - Pop flags
= INTn - Software interrupts
= IRET - Interrupt return.
Note that the actual I/O instructions such as IN, OUT, INS and OUTS are not
controlled by IOPL. Instead, these four instructions are controlled solely by the VM86
task's I/O permission map ( if one is defined). If the bit corresponding to the I/O locations
being accessed is clear, I/O access is permitted; otherwise, the I/O instruction causes a
general protection fault.
Review Questions
Write down the steps to enter and leave the Virtual 8086 Mode.
”. List the instructions in Virtual 8086 Mode.
Describe how physical address is obtained in Virtual 8086 Mode.
Write a notes on
4. Virtual 8086 mode.
b. Muttiple virtual 8086 mode tasks.
cc. Input/Output in virtual mode.
d. Paging in virtual 8086 mode
e. Protection and 1/0 permission bit map in VM86 mode.
Qo0o0Interrupts, Exceptions and I/O
7.1 Introduction
Sometimes it is necessary to have the computer automatically execute one of a
collection of special routines whenever certain conditions exist within a program or the
microcomputer system. e.g, it is necessary that microcomputer system should give
response to devices such as keyboard, sensor and other components when they request for
service.
The most common method of servicing such device is the Polled approach. This is
where the processor must test each device in sequence and in effect “ask” each one if it
needs communication with the processor. It is easy to see that a large portion of the main
program is looping through this continuous polling cycle. Such a method would have a
serious and detrimental effect on system throughout, thus limiting the tasks that could be
assumed by the microcomputer and reducing the .cost effectiveness of using such devices.
A more desirable method would bé the one that allows the microprocessor to execute
its main program and only stop to service peripheral devices when it is told to do so by
the device itself. In effect, the method would provide an external asynchronous input that
would inform the processor that it should complete whatever instruction that is currently
being executed and fetch a new routine that will service the requesting device. Once this
servicing is completed, the processor would resume exactly where it left off. This method
is called interrupt method. It is easy to see that system throughput would drastically
increase, and thus enhance its cost effectiveness. Most microprocessors allow execution of
special routines by interrupting normal program execution. When a microprocessor is
interrupted, it stops executing its current program and calls a special routine which
“services” the interrupt. The event that causes the interruption is called interrupt and the
special routine which is executed is called interrupt service routine/procedure. An
interrupt causes the microprocessor to temporarily suspend execution of the current
program and forces it to jump to another program, ISR. At the completion of the ISR, the
microprocessor must then return to the original program flow at the point it was
interrupted. Normal program can be interrupted by three ways :
1. By external signal
2. By a special instruction in the program or
3. By the occurrence of some condition.
(7-1)Microprocessors and Microcontrollers 7-2 Interrupts, Exceptions and /O
To handle such interruptions Pentium provides two special kind of control transfers :
Interrupts and Exceptions.
Exceptions differ from the interrupts. The interrupts are used to handle asynchronous
events external to the processor where as exceptions handle conditions detected by the
processor itself in the course of executing instructions.
1, Interrupts (Hardware)
= Maskable interrupts, which are routed via the INTR pin.
= Nonmaskable interrupts, which are routed via the NMI (Non-Maskable Interrupt)
pin.
Hardware interrupts occur as the result of an external event. These interrupts are
serviced after the execution of the current instruction. After the interrupt handler. is
finished servicing the interrupt, execution proceeds with the instruction immediately after
the interrupted instruction.
2. Exceptions
Processor detected. They are further classified as faults, traps, or aborts depending on
the way they are reported, and whether or not restart of the instruction causing the
exception is supported.
Faults : Faults are the exceptions that are detected and serviced before the execution of the
faulting instruction. For example, in virtual memory system if page or segment referenced
by processor is not present, the operating fetches the page or segment from disk using
fault exception routine, and then Pentium restarts processing using referenced page or
segment -
Traps : Traps are exceptions that are reported immediately after the execution of the
instructions which causes the problem. User defined interrupts are the examples of traps.
Aborts : Aborts are exceptions which do not permit the precise location of the instruction
causing the exception to be determined. Aborts are used to report severe errors, such as
hardware error, or illegal values in the system tables.
Programmed/ Software Interrupts : The instructions INTO, INT3, INTn and BOUND
can trigger exceptions. These instructions are often called “Software Interrupts’, but the
Pentium handles them as exceptions.Microprocessors and Microcontrollers 73 Interrupts, Exceptions and /O0.
ns in Pentium
7.2 Interrupts and Exception Condi
Interrupt 0 : Divide Error
When the quotient from either a DIV or IDIV instruction is too large to fit in the result
register; Pentium will automatically triggers type 0 interrupt.
Interrupt 1 : Debug Exceptions
The Pentium triggers this interrupt for one of the conditions; whether the exception is
a fault or a trap depends on the condition :
= Instruction address breakpoint fault.
= Data address breakpoint trap.
= General detect fault.
= Single-step trap.
= Task-switch breakpoint trap.
The Pentium does not push an error code for this exception. An exception handler
can examine the debug registers to determine which condition caused the exception.
Interrupt 2 ; Non Maskable Interrupt
As the name suggests, this interrupt can not be disabled by any software instruction.
This interrupt is activated by low to high transition on Pentium NMI input pin. In
response, Pentium. triggers a type 2 interrupt.
Interrupt 3 : Breakpoint
The type 3 interrupt is used to implement BREAK POINT function in the system. The
type 3 interrupt is produced by execution of the INT 3 instruction. Break point function is
often used as a debugging aid in cases where single stepping provides more detail than
wanted. When you insert a breakpoint, the system executes the instructions upto the
breakpoint, and then goes to the breakpoint procedure. In the break point procedure you
can write a program to display register contents, memory contents and other information
that is required to debug your program. You can insert as many breakpoints as you want
in your program.
Interrupt 4; Overflow Interrupt
The type 4 interrupt is used to check overflow condition after any signed arithmetic
operation in the system. The Pentium overflow flag, OF, will be represented in the
destination register or memory location.
For example, if you add the S-bit signed number 0111 1000 (+ 120 decimal) and the 8
bit signed number 0110 1010 (+ 106 decimal), result is 1110 0010 (- 98 decimal). In signed
numbers, MSB (Most Significant Bit) is reserved for sign and other bits represent
magnitude of the number. In the previous example, after addition of two 8-bit signed
numbers result is negative, since it is too large to fit in 7 bits. To detect this condition inMicroprocessors and Microcontrollers 14 Interrupts, Exceptions and /O
the program, you can put interrupt on overflow instruction, INTO, immediately after the
arithmetic instruction, the instruction will simply function as NOP (no operation).
However, if the overflow flag is set, indicating an overflow error, the Pentium will trigger
a type 4 interrupt after executing the INTO instruction.
Another way to detect and respond to an overflow error in a program is to put the
jump if overflow instruction (0) immediately after the arithmetic instruction. If the
overflow flag is set as a result of arithmetic operation, execution will jump to the address
specified in the JO instruction. At this address you can put an error routine which
responds in the way you want to the overflow.
Interrupt 5 : Bounds Check
The Pentium triggers interrupt 5 if it notices that the operand has crossed the limits
specified by the previously executed BOUND instruction.
Interrupt 6 : Invalid Opcode
This fault occurs when an invalid opcode is detected by the execution unit. (The
exception is not detected until an attempt is made to execute the invalid opcode; i.e.,
prefetching an invalid opcode does not cause this exception). No error code is pushed on
the stack. The exception can be handled within the same task.
This exception also occurs when the type of operand is invalid for the given opcode.
Examples include an intersegment JMP referencing a register operand, or an LES
instruction with a register source operand.
Interrupt 7 : Coprocessor Not Available
This exception occurs in either of two conditions :
= The Pentium encounters an ESC (escape) instruction, and the EM (emulate) bit of
CRO (control register zero) is set.
® The Pentium encounters either the WAIT instruction or an ESC instruction, and
both the MP (monitor coprocessor) and TS (task switched) bits of CRO are set.
Interrupt 8 : Double Fault
Normally, when the Pentium detects an exception while trying to invoke the handler
for a prior exception, the two exceptions can be handled serially. If, however, the Pentium
cannot handle them serially, it signals the double-fault exception instead. To determine
when two faults are to be signalled as a double fault, the Pentium divides the exceptions
into three classes ; benign exceptions, contributory exceptions, and page faults. Table 7.1
shows this classification. It also shows which combinations of exceptions cause a double
fault and which do not.Microprocessors and Microcontrollers 75 Interrupts, Exceptions and /0.
Class 1D Description
1 Debug exceptions
2 | NM
3 | Breakpoint
roti 5 | Bounds check
i v
Ptions | | invalid opcode
7 | Coprocessor not available
416 __|_Coprocessor error
0 | Divide error
9 | Coprocessor segment overrun
Contributory | 10 | Invaid TSs
Exceptions | 11 | Segment not present
412 | Stack exception
13 __| General protection
Page Faults | 14 | Page faut
Table 7.1 Double-fault detection classes
SECOND EXCEPTION
Benign Contributory | Page Faut
Exception Exception
Benign OK OK OK
Exception
FIRST Contributory OK DOUBLE OK
EXCEPTION | — Exception
Page Fault OK DOUBLE DOUBLE
Table 7.2 Double-fault definition
Interrupt 9 : Reserved by Intel
Interrupt 10 : Invalid TSS
Interrupt -10 occurs if during a task switch the new TSS is invalid.
considered invalid in the cases shown in Table 73. An error code is pushed onto the stack
to help identify the cause of the fault. The EXT bit indicates whether the exception was
caused by a condition outside the control of the program; e.g., an external interrupt via a
task gate triggered a switch to
an invalid TSS,
A TSS is
Error Code Condition
TSS id + EXT The limit in the TSS descriptor is less than 103
LTD id + EXT Invalid LOT selector or LOT not present
SS id + EXT ‘Stack segment selector is outside table limit
SS \d + EXT ‘Stack segment is not a writeable segment
SS id + EXT | Stack segment DPL does not maich new CPL
88 id + EXT Stack segment selector RPL < > CPL
CS id + EXT Code segment selector is outside table limitMicroprocessors and Microcontrollers __7-6 Interrupts, Exceptions and /O
CS id + EXT Code segment selector does not refer to code segment
CS id + EXT DPL of non-conforming code segment < > new CPL
CS id + EXT DPL of conforming code segment > new CPL
DS/ESIFS/GSid + EXT _|_DS, ES, FS, or GS segment selector is outside table limits
DS/ESIFSIGS DS, ES, FS, or GS is not readable segment
id + EXT
Table 7.3 Conditions that invalidate the TSS
Interrupt 11 : Segment Not Present
Exception 11 occurs when the Pentium detects that the present bit of a descriptor is
zero. The Pentium triggers this fault in any of these cases :
While attempting to load the CS, DS, ES, FS, or GS registers; loading the SS
register however, causes a stack fault.
While attempting loading the LDT register with an LLDT instruction; loading the
LDT register during a task switch operation, however, causes the “invalid TSS”
exception.
While attempting to use a gate descriptor that is marked not-present.
This fault is restartable. If the exception handler makes the segment present and
returns, the interrupted program will resume execution.
Interrupt 12 : Stack Exception
A stack fault occurs in either of two general conditions :
As a result of a limit violation in any operation that refers to the SS register. This
includes stack-oriented instructions such as POP, PUSH, ENTER, and LEAVE, as
well as other memory references that implicitly use SS (for example, MOV AX,
[BP + 8]). ENTER causes this exception when the stack is too small for the
indicated local- variable space.
When attempting to load the SS register with a descriptor that is marked
not-present but is otherwise valid. This can’ occur in a task switch, an interlevel
CALL, an interlevel return, an LSS instruction, or a MOV or POP instruction to SS.
When the Pentium detects a stack exception, it pushes an error code onto the stack of
the exception handler. If the exception is due to a not-present stack segment or to overflow
of the new stack during an interlevel CALL, the error code contains a selector to the
segment in question (the exception handler can test the present bit in the descriptor to
determine which exception occurred); otherwise the error code is zero.Microprocessors and Microcontrollers___7-7 Interrupts, Exceptions and /O
Interrupt 13 : Géneral Protection Exception
All protection violations that do not cause another exception cause a general protection
exception. This includes
1. Exceeding segment limit when using CS, DS, ES, FS, or GS
Exceeding segment limit when referencing a descriptor table
Transferring control to a segment that is not executable
Writing into a read-only data segment or into a code segment
Reading from an execute-only segment
Loading the SS register with a read-only descriptor (unless the selector comes from
the TSS during a task switch, in which case a TSS exception occurs).
x
Loading SS, DS, ES, FS, or GS with the descriptor of a system segment
8. Loading DS, ES, FS, or GS with the descriptor of an executable segment that is not
also readable.
9. Loading $S with the descriptor of an executable segment
10. Accessing memory via DS, ES, FS, or GS when the segment register contains a null
selector
11. Switching to a busy task
12, Violating privilege rules
13. Loading CRO with PG = 1 and PE = 0.
14. Interrupt or exception via trap or interrupt gate from V86 mode to privilege level
other than zero.
15. Exceeding the instruction length limit of 15 bytes (this can occur only if redundant
prefixes are placed before an instruction)
The general protection is a fault. In response to a general protection exception, the
Pentium pushes an error code onto the exception handler’s stack. If loading a descriptor
causes the exception, the error code contains a selector to the descriptor; otherwise, the
error code is null.
Interrupt 14 : Page Fault
This exception occurs when paging is enabled ( PG = 1) and the Pentium detects one
of the following conditions while translating a linear address to a physical address :
= The page-directory or page-table entry needed for the address translation has zero
in its present bit.
= The current procedure does not have sufficient privilege to access the indicated
page.Microprocessors and Microcontrollers 7-8
Interrupt 15 : Reserved by Intel.
Interrupt 16 : Coprocessor Error.
The Pentium reports this exception when it detects a signal from the 80287 or 80387
on the Pentium’s ERROR input pin. The Pentium tests this pin only at the beginning of
certain ESC instructions and when it encounters a WAIT instruction while the EM bit of
the MSW is zero (no emulation).
Interrupts, Exceptions and /O
Table 7.4 shows the interrupt and exception summary
Return Address i
Interrupt Function that can Generate
Description Number, {Points to Faulting) Type the Exception
Instruction
Divide error ° YES FAULT Diy, IDV
Debug excepti 1 YES TRAP Any Instruction
NMI 2 NO NMI INT 2 or NMI
Break point 3 NO TRAP ‘One-byte INT 3.
Overflow 4 NO TRAP INTO
Bounds check 5 YES FAULT BOUND
Invalid opcode 6 YES FAULT Any illegal instruction
Coprocessor not available 7 YES FAULT ESC, WAIT
Double fault 8 YES ‘ABORT ‘Any jinsiruction that can!
generate an exception
Coprocessor Segment! 9 NO ‘ABORT ‘Any operand of an ESC
Overrun instruction that wraps|
around the end of a
‘segment.
Invalid TSS 10 YES FAULT JMP, CALL, IRET, any|
interrupt
‘Segment not present 1 YES FAULT Any segment-register|
modifier
‘Stack exception 12 YES FAULT ‘Any memory reference
through SS
General Protection 13 YES FAULTABOR | Any memory reference or
T code fetch
Page fault 14 YES FAULT ‘Any memory reference or
code fetch
Reserved by Intel 15 = _ =
Coprocessor error 16 YES FAULT ESC, WAIT
Two-byte SW Interrupt 0-255 NO TRAP, INTA
Table 7.4 Interrupt and exception summaryMicroprocessors and Microcontrollers 79 Interrupts, Exceptions and 1/0
7.3 Enabling and Disabling Interrupts
+Certain conditions and flag settings cause the Pentium to inhibit/mask certain
interrupts and exceptions at instruction boundaries. These conditions and settings are :
7.3.1 NMI Masks Further NMis
While an NMI handler is executing, the Pentium ignores further interrupt signals at
the NMI pin until the next IRET instruction is executed.
7.3.2 IF Masks INTR
The IF (interrupt-enable flag) controls the acceptance of external interrupts routed via
the INTR pin. When IF = 0, INTR interrupts are masked; when IF = 1, INTR interrupts
are enabled. In response to RESET IF flag is cleared. It can be set or reset by STI and CLI
instructions, respectively. These instructions may be executed only if CPLsIOPL A
protection exception occurs if they are executed when CPL > IOPL.
7.3.3 RF Masks Debug Faults
The RF bit in EFLAGs controls the recognition of debug faults. This allows debug
faults to be raised for a given instruction at most once, no matter how many times the
instruction is restarted.
7.4 Priority Among Simultaneous Interrupts and Exceptions
If more than one interrupt or exception is pending at an instruction boundary, the
Pentium services one of them at a time. The priority among types of interrupt and
exception sources is shown in Table 7.5. The Pentium first services a pending interrupt or
exception from the type that has the highest priority, transferring control to the first
instruction of the interrupt handler. Lower priority exceptions are discarded; lower
priority interrupts are held pending. Discarded exceptions will be rediscovered when the
interrupt handler returns control to the point of interruption.
Priority Types of Interrupt or Exception
HIGHST Faults except debug faults. Trap instructions INTO, INTa, INT3. Debug
traps for this instruction. Debug faults for next instruction NMI interrupt
Lowest INTR interrupt
Table 7.5
7.5 Handling Interrupts and Exceptions in Real Mode
The Pentium supports Real Mode interrupts and exceptions much like the 8086. In
Pentium, addresses from 0 through 3FFH (400H memory locations) are dedicated for
Interrupt Descriptor Table (IDT) after Reset. This table contains pointers that define the
starting point of the interrupt service routines. Each pointer in the table requires four bytes
of memory. Thus, it contains upto 256 (4% 256 = 1024 = 400H ) interrupt pointers. Four
bytes in each pointer represent two words. The word having higher memory address holdsMicroprocessors and Microcontrollers 7-10 Interrupts, Exceptions and /0
the segment base address, whereas the word having lower memory address holds offset.
Fig. 7.1 shows the Interrupt Descriptor Table (IDI). Like 8086, interrupts are recognized by
their numbers/types. Each time when interrupt occurs, Pentium multiplies interrupt
number/type by four to generate an index into the interrupt descriptor table
word 1
word 2
Gate for
interrupt #
Gate for
interrupt # nt
ce Gate for \
interrupt # 1 Increasing
memory
address
Gate for
interrupt # 0
[er]
TOTR end >
Fig. 7.1 Interrupt descriptor table
In Pentium, the Interrupt Descriptor Table is relocatable. The base address of interrupt
descriptor table is present in the IDTR (Interrupt Descriptor Table Register ). The
programmer can change this address by loading different address in the IDTR. This is
possible using LIDT instruction. The LIDT instruction allows the relocation of base address
and it also used to specify the size of the IDT. If an interrupt occurs and the
corresponding entry in the interrupt table is beyond the limit stored in IDTR, a general
protection fault (exception’8) will occur. Table 7.6 summarises Pentium Real Address Mode
exceptions.
Interrupt Cause of Exception Description
Number
0 DIV, IDIV Divide error
1 All Debug exceptions
3 INT Breakpoint
4 INTO Overfiow
5 BOUND Bounds check
6 Any undefined opcode or LOCK used | Invalid opcode
with wrong instruction
ESC or WAIT Coprocessor not availableMicroprocessors and Microcontrollers 7-11 Interrupts, Exceptions and /O
8 INT vector is not within IOTR limit Interrupt table limit too small
ott Reserved
2 Memory operand crosses offset 0 or | Stack fault
OFFFFH
8 Memory operand crosses offset | Pseudo-protection exception
OFFFFH or attempt to execute past
offset OFFFFH or instruction longer
than 15 bytos
14,15 Reserved
16 ESC or WAIT Coprocessor error
0-255 INTa : Two-byte software interrupt
Lo
Table 7.6 Pentium real-address mode exceptions
Note 1: Some debug exceptions point to the faulting instruction, others to the next
instruction. By examining the contents of DR6, it is possible to determine
whether the debug iS pointing to the faulting instruction or to the next
instruction. . .
Note 2: The coprocessor errors are reported on the first ESC or WAIT instruction after
the ESC instruction that caused the error.
7.6 Handling Interrupts and Exceptions in Protected Mode
In protection mode, each interrupt or exception is associated with a descriptor which
gives the information about interrupt service routine. These descriptors are stored in a
special descriptor table called the interrupt descriptor table or IDT. This table can be
located anywhere in memory. Like the GDT and LDTs, the IDT is an array of 8 byte
descriptors. The base address and limit for the interrupt descriptors table are loaded into
the interrupt descriptor table register (IDTR) as shown in Fig. 7.2. ( See Fig. on next page)
Because there ‘are only 256 identifiers,.the IDT need not contain more than 256
descriptors. There are three types of descriptors can be used in the IDT.
a Trap gate descriptor
= Interrupt gate descriptor
a Task gate descriptor
If any other type of descriptor is found in the IDT when an exception occurs, the
Pentium generates a general protection fault.
TASK cind INTERRUPT Gates
Task gate descriptors are discussed earlier. Trap gate and interrupt gate descriptors are
introduced in this section. These descriptors contain pointer to segment descriptors. These
are similar to call gates. The most noticeable difference is the absence of a word count
field for passing parameters to the stack.Microprocessors and Microcontrollers 7-12 Interrupts, Exceptions and /O
Gate INT 255
Descriptor (User defined)
nf
Gate INT4.
Descriptor (Over flow)
Gate INT3.
Descriptor (Ereakpoint)
Gate INT2
Descriptor (Non maskable
interrupt)
INTA
(Debug exceptions)
Gate
Descriptor
INTO
(Divide error)
2
DT .
Fig. 7.2 Interrupt descriptor table and IDTR
Pentium TASK GATE
31 23 16 7
(NOT USE) P] vp. Jo0101) (NoTUSE)
SELECTOR (Nor USE)
Pentium INTERRUPT GATE
24 23 15 7 °
OFFSET 31, P}opt}o1110/0 0 of (NOT),
Use)
SELECTOR OFFSET 16
Pentium TRAP GATE
31 23 15 7 °
(nor,
OFFSET 31.16 Plop.}ortit joo of 02],
SELECTOR OFFSET 15 ..
Fig. 7.3 Pentium IDT gate descriptorsMicroprocessors and Microcontrollers 7AZ Interrupts, Exceptions and /O
When an interrupt or exception occurs, its identifier, a number is multiplied by 8 and
added to the IDT base address stored in the IDTR. The result is a pointer to gate
descriptor in the interrupt descriptor table. The gate descriptor can be any of three types :
Interrupt gate, a trap gate, or a task gate. The 32-bit base address from gate descriptor and
the 32-bit offset from gate are added to generate the linear address for the actual interrupt
procedure as shown in the Fig 7.4.
EXECUTABLE
‘SEGMENT
OFFSET | ENTRY POINT
INTERRUPT | TRAP GATE OR
ID —>| INTERRUPT GATE
LOT OR GDT
DESCRIPTOR
INTERRUPT DESCRIPTOR
‘TABLE(IDT)
Fig. 7.4 Interrupt vectoring for procedures,
Then the contents of EFLAGs and information needed for returning to the original
procedure are stored in the stack and control is transferred to interrupt or exception
handling procedure.
Before Pentium passes control, however, pushes at least 12 bytes onto the ISR’s stack.
Fig, 7.5 shows the information that is stacked before control is transferred to interrupt or
exception handling procedure.
As shown in Fig. 7.5 if an exception cause a special error code it is saved on the stack.
If exception requires a privilege change then old stack segment and pointer are also saved
on the stack.Microprocessors and Microcontrollers 7-14 Interrupts, Exceptions and /O
EFLAGS
cs
EIP
"No Privilege EIP
change
Woes Nopratee
change L
Error code Privilege change
No error code
Privilege change
Error code
Fig. 7.5 Information stored onto the stack
Trap Gate Vs Interrupt Gate
Trap gate operates exactly like an interrupt gate in all respect except one. When an
exception vectors through a trap gate all flags remain exactly as they were when the
exception occurred (No change in flags status).
When an exception vectors through an interrupt gate, the Pentium resets IF (Interrupt
Flag) to disable further hardware interrupts, after it pushes the return address and
EFLAGs but before it executes first instruction of the ISR.
79 Returning from an Interrupt Procedure
‘An interrupt procedure slightly differs from the normal procedure, as its method of
leaving the procedure is different from normal procedure. While returning from the
interrupt it is necessary to read EFLAGS from the stack, Thus the IRET instruction is used
to exit from an interrupt procedure instead of RET instruction. The IRET instruction
increments EIP by an extra four bytes and loads the saved flags into the EFLAGs register.
The IRET instruction then loads CS, and EIP pointers to point previous procedure from
where it is interrupted. In case privilege change it also loads old stack segment and
pointer.
Processing Interrupt Service Routines
= IDT stores the descriptors for interrupt service routines
= Only trap gate, interrupt gate and task gate descriptors are allowed.
= Operate like programmed procedures/subroutines
= Before transferring saves all register that are used
= ISRs are invoked by interrupts or exceptions instead of CALL instructions
= ISRs terminate with IRET instead of RET instructions.Microprocessors and Microcontrollers 7-15 Interrupts, Exceptions and /O
Privilege levels
Like call gates, interrupt and trap gates have privilege levels associated with them. The
DPL field of a trap, task or interrupt gate determines the minimum privilege level required
to pass through the gate. The CPL must be equal or higher privileged than the gate’s DPL.
It is recommended that DPL field always be kept at privilege level 3. Due to this any
privilege level program can handle exceptions.
The another condition must be satisfied to handle the exception is that the exception
code’s DPL must have equal or less privilege level than the CPL.
Exception handler privilege levels
= Exception gate’s DPL must be less privilege than CPL
= Exception codes DPL must be equal or less privilege than CPL
Task gates
The third alternative for an exception gate is a task gate. When an exception identifier
selects a task gate, the Pentium performs an immediate task switch. The task activated is
determined by the TSS selector stored in the task gate descriptor. The TSS selector of the
current task, which is now dormant, is copied into the Back Link field of the new task.
The new task will have its NT (nested task) bit set in EFLAGs. When the exception
handling task completes and executes an IRET instruction, the Pentium activates the
interrupted task based on the back link information.
Advantage of using task gate over trap and interrupt gates :
= The entire context of the interrupted task is saved automatically.
= The exception handler does not need to be concerned with contaminating the
interrupted code.
= The exception handler can run at any privilege level.
= The exception handler can use it own private code and data space because it can
have its own LDT.
Drawbacks of using task gate over trap and interrupt gates :
= More time is required to perform task switch
= A task gate cannot specify where in the task to begin execution. Dormant tasks
always resume where they left off.
= It is difficult to retrieve any information about the interrupted code when it is in a
different task.Microprocessors and Microcontrollers __7-16 Interrupts, Exceptions and
7.8 Interrupts and Exceptions in Virtual 8086 Mode
Hardware interrupts, software exceptions, and processor aborts, traps and faults are
handled differently in virtual 8086 mode than they are in protected mode. The Pentium.
operating system determines if the interrupt comes from a protected mode application or
from a virtual 8086 mode program by examining the VM bit in the EFLAGS. In virtual
8086 mode exception process either use 8086 style ISR or protected mode ISR.
7.8.1 Protected Mode ISR
When an interrupt or exception occurs in virtual 8086 mode, Pentium first switches
from virtual 8086 mode to protected mode. Then it locates the current task’s TSS (pointed
by TR) and reads the privilege level 0 stack selector and stack pointer. It pushes the
current CS, EIP (32-bit) and EFLAGs (32-bit) onto this stack. This operation is similar to
8086 interrupt processing operation. It also pushes SS and ESP onto the stack. If exception
generates error code then the error code is also saved onto the stack. It also saves all four
data segment registers (DS, ES, FS and GS) and loads zero into these segment registers
before starting to execute the handler procedure. Fig. 7.6 shows protected mode privilege
level 0 stack and virtual 8086 program stack after recognition of the exception but just
before the beginning of the ISR execution.
Protected Mode Virtual 8086
Privilege Level 0 Stack Program Stack
7m Gs Sss:sP
2 FS
7 Ds
? ES
7 ss
ESP
EFLAGS
7 Gs
eIP |+-— ss:€SP (no enor code)
Error Code (optional) f+ — SS:ESP (with error code}
4 A
Fig. 7.6 Protected mode privilege level 0 stack and virtual 8086 program stackMicroprocessors and Microcontrollers _7-17 Interrupts, Exceptions and /O
All four data segments are loaded with zeroes because there is difference between
virtual 8086 memory segmentation and protected mode segmentation, In virtual mode,
segment register gives the segment’s base address whereas in protected mode the contents
of segment register gives 13-bit selector to segment descriptor, one bit to select a descriptor
table and two more bits to determine or affect privilege level protection. If ISR begins with
the virtual 8086 mode values, it is always possible to make wrong memory accesses.
Therefore, it is necessary to flush all six segment registers during the transition, from
virtual 8086 mode to protected mode.
Virtual 8086 Mode Exception Processing
= The Pentium automatically changes from Virtual 8086 to Protected mode.
= The Protected mode PLO stack is used at all times.
= The Virtual 8086 stack remains unused.
= The 32-bit register contents are saved.
«= The four data segment registers are saved.
For returning from ISR, Pentium executes an IRET instruction. The Pentium itself
checks VM bit when it encounters an IRET instruction. If it finds VM bit set as it pops
EFLAGs from the stack, it restores the four 8086-style segment registers before returning to
a Virtual 8086 Mode program.
Terminating Virtual 8086 Mode Exception Processing
= The interrupt service routine executes an IRET instruction.
= The Pentium removes information from protected PLO stack.
= The VM bit in EFLAGS identifies the Virtual 8086 mode caller.
= The Pentium restores four data segment registers.
7.8.2 8086 Style ISR
When interrupt or exception occurs in virtual 8086 mode it is possible to use 8086 style
ISRs and access interrupt vector table. However, the Pentium does not allow to use this
table directly. Whenever exception occurs, Pentium does the task switch and sets NT bit.
Unfortunately, this NT bit is checked only when IRET is executed in Protected Mode. NT
is ignored in Virtual 8086 Mode. Thus virtual 8086 task cannot return to its parent task
with an IRET instruction. To avoid this, it is necessary to use either a protected mode task
or protected mode procedure to handle all virtual 8086 mode exceptions. The following
steps are involved to handle 8086 style ISR.
1. When interrupt or exception occurs in virtual 8086 mode, control is given to the
PLO Protected Mode ISR and the virtual 8086 mode programs status is pushed
onto the top of the ISR’s stack.Microprocessors and Microcontrollers 7-18 Interrupts, Exceptions and /0
2. The Pentium, copies IP, CS, and FLAGs (only 16-bits) from the ISR’s stack onto the
virtual 8086 mode program's stack, and modifies the virtual 8086 mode program's
stack pointer.
It stores all of the information on the ISR’s stack.
It pushes a FLAGs register (32-bit) with bit 17 (VM) set and bits 12 and 13 (IOPL)
cleared. It also pushes an 8086 style CS segment and EIP.
5. It then executes IRET instruction and terminates the protected mode ISR. Due to
this program control is returned to the 8086 exception handler and not to the
interrupted program.
6. It executes 8086 ISR and finally IRET instruction to generate a general protection
fault.
7. The general protection handler then transfers control to the first ISR (Protected
mode ISR).
8. Then Pentium reads the IP, CS and FLAGS from the virtual 8086 mode program's
stack and adjusts SP accordingly.
9. It then reads information stored in step 3 from the ISR’s stack and executes
another IRET instruction. This terminates protected mode ISR and gets the return
address and EFLAGs register status.
10. At the end, the Pentium resumes the execution of interrupted program.
Using 8086 Interrupt Service Routines
= The 8086 cannot directly utilize 8086 ISR code.
= The exception handler must run in Protected mode.
= The exception handler can “reflect” the interrupt to an 8086 ISR.
= The 8086 ISR must return to the Protected mode exception handler.
= The exception handler returns to the interrupted code.
7.9 1/O Handling In Pentium
In addition to transferring data to and from external memory, Pentium processors can
also transfer data to and from input/output ports (I/O ports). I/O ports are created in
system hardware by circuity that decodes the control, data, and address pins on the
processor. These I/O ports are then configured to communicate with peripheral devices.
An I/O port can be an input port, an output port, or a bidirectional port. Some I/O ports
are used for transmitting data, such as to and from the transmit and receive registers,
respectively, of a serial interface device. Other I/O ports are used to control peripheral
devices, such as the control registers of a disk controller.Microprocessors and Microcontrollers 7-19 Interrupts, Exceptions and 1/0
This chapter describes the processor's I/O architecture. The topics discussed include:
= I/O port addressing
« I/O instructions
= 1/O protection mechanism
7.9.1 /O Port Addressing
The Pentium processor permits applications to access I/O ports in either of two ways:
© Through a separate I/O address space
© Through memory-mapped I/O.
Accessing 1/0 ports through the I/O address space is handled through a set of I/O
instructions and a ‘special I/O protection mechanism. Accessing I/O ports through
memory-mapped I/O is handled with the processors general-purpose move and string
instructions, with protection provided through segmentation or paging. I/O ports can be
mapped so that they appear in the I/O address space or the physical-memory address
space (memory mapped I/O) or both.
One benefit of using the I/O address space is that writes to I/O ports are guaranteed
to be completed before the next instruction in the instruction stream is executed. Thus, 1/O
writes to control system hardware cause the hardware to be set to its new state before any
other instructions are executed.
7.9.2 1/O Port Hardware
From a hardware point of view, I/O addressing is handled through the processor's
address lines. In Pentium processors , the M/TO pin indicates a memory address (1) or an
1/O address (0). When the separate I/O address space is selected, it is the responsibility of
the hardware to decode the memory-I/O bus transaction to select I/O ports rather than
memory. Data is transmitted between the processor and an I/O device through the data
lines.
7.9.3 1/O Address Space
‘The processor's 1/O address space is separate and distinct from the physical-memory
address space. The I/O address space consists of 2'° (64K) individually addressable 8-bit
1/O ports, numbered 0 through FEFFH. I/O port addresses OF8H through OFFH are
reserved. We should not assign I/O ports to these addresses. Any two consecutive 8-bit
ports can be treated as a 16-bit port, and any four consecutive ports can be a 32-bit port.
In this manner, the processor can transfer 8, 16, or 32 bits to or from a device in the 1/O
address space. Like words in memory, 16-bit ports should be aligned to even addresses (0,
2, 4, ...) so that all 16 bits can be transferred in a single bus cycle. Likewise, 32-bit ports
should be aligned to addresses that are multiples of four (0, 4, 8, ...). The Pentium
processor supports data transfers to unaligned ports, but there is a performance penalty
because one or more extra bus cycle is required to complete the data transfer. If hardwareMicroprocessors and Microcontrollers __7-20 Interrupts, Exceptions and VO
or software requires that I/O ports be written to in a particular order, that order must be
specified explicitly. For example, to load a word-length I/O port at address 2H and then
another word port at 4H, two word-length writes must be used, rather than a single
doubleword write at 2H. Note that the processor does not mask parity errors for bus
cycles to the 1/0 address space. Accessing I/O ports through the I/O address space is
thus a possible source of parity errors.
7.9.4 Memory-Mapped 1/0
I/O devices that respond like memory components can be accessed through the
Pentium processor's physical-memory address space as shown in Fig. 7.7. When using
memory-mapped 1/O, any of the Pentium processor's instructions that reference memory
can be used to access an I/O port located at a physical-memory address. For example, the
MOV instruction can transfer data betweén any register and a memory-mapped I/O port.
The AND, OR, and TEST instructions may be used to manipulate bits in the control and
status registers of a memory-mapped peripheral devices.
Physical memory
FFFF FFFFH
Fig. 7.7 Memory-mapped 0
When tsing memory-mapped I/O, caching of the address space mapped for 1/O
operations must be prevented. The Pentium provides the KEN pin, which when held
inactive (high) prevents caching of all addresses sent out on the system bus. To use this
pin, external address decoding logic is required to block caching in specific address spaces.
The Pentium processor also provide the PCD (page-level cache disable) flag in page table
and page directory entries. This flag allows caching to be disabled on a page-by-page
basis.
7.9.5 1/O Instructions
The processor's I/O instructions provide access to I/O ports through the I/O address
space. (These instructions cannot be used to access memory-mapped 1/O ports.) There are
two groups of I/O instructions :Microprocessors and Microcontrollers 7-21 Interrupts, Exceptions and /O
= Those that transfer a single item (byte, word, or doubleword) between an I/O port
and a general-purpose register.
= Those that transfer strings of items (strings of bytes, words, or doublewords)
between an I/O port and memory,
The register I/O instructions IN (input from I/O port) and OUT (output to 1/0 port)
move data between I/O ports and the EAX register (32-bit I/O), the AX register (16-bit
I/O), or the AL (8-bit 1/O) register. The address of the I/O port can be given with an
immediate value or a value in the DX register.
The string I/O instructions INS (input string from 1/O port) and OUTS (output string
to I/O port) move data between an I/O port and a memory location. The address of the
I/O port being accessed is given in the DX register; the source or destination memory
address is given in the DS:ESI or ES:EDI register, respectively.
When used with one of the repeat prefixes (such as REP), the INS and OUTS
instructions perform string (or block) input or output operations. The repeat prefix REP
modifies the INS and OUTS instructions to transfer blocks of data between an I/O port
and memory. Here, the ESI or EDI register is incremented or decremented (according to
the setting of the DF flag’ in the EFLAGS register) after each byte, word, or doubleword is
transferred between the selected I/O port and memory. See the individual references for
the IN, INS, OUT, and OUTS in Chapter 3.
7.9.6 Protected-Mode 1/0
When the processor is running in protected mode, the following protection
mechanisms regulate access to 1/O ports.
= When accessing 1/O ports through the I/O address space, two protection devices
control access:
~ The 1/O privilege level (IOPL) field in the EFLAGS register
- The I/O permission bit map of a task state segment (ISS)
= When accessing memory-mapped 1/O ports, the normal segmentation and paging
protection also affect access to I/O ports
7.9.7 Ordering 1/0
When controlling I/O devices it is often important that memory and I/O operations be
carried out in precisely the order programmed. For example, a program may write a
command to an I/O port, then read the status of the I/O device from another 1/O port. It
is important that the status returned be the status of the device after it receives the
command, not before.
When using memory-mapped I/O, caution should be taken to avoid situations in
which the programmed order is not preserved by the processor. To optimize performance,
the processor allows cacheable memory reads to be reordered ahead of buffered writes in
most situations.foprocessors and Microcontrollers _7-22 Interrupts, Exceptions and VO
Internally, processor reads (cache hits) can be reordered around buffered writes. When
using memory-mapped I/O, therefore, is possible that an I/O read might be performed
before the memory write of a previous instruction. Thus, it is recommended that we
should prevent caching of the address space mapped for I/O operations. Refer section
79. for more details.
Review Questions
1. Compare Polling and Interrupt method.
2. Give one line explanation for the following terms with respect to Pentium
a) Faults, b) Traps, c) Aborts
.. Describe interrupts and exceptions in Real Mode.
}. Explain protected mode interrupt processing
What is difference between trop gate and interrupt gate ?
. Discuss the advantages and disadvantages of task gate used in for interrupt processing over
trap gate and interrupt gate.
Describe interrupts and exeptions in Virtual 8086 Mode.
3. Write a short note on I/O handling in Pentium.
Qo00
You might also like
Presented by C.Gokul, Ap/Eee: Departments: Cse, It, Ece, Ece, Mech Regulation: 2013
Presented by C.Gokul, Ap/Eee: Departments: Cse, It, Ece, Ece, Mech Regulation: 2013
633 pages