ARM Processors for EIE Students
ARM Processors for EIE Students
TEACHER : G N SRIKANTH
CLASS : 5Th SEM
SUBJECT/CODE : ARM PROCESSOR 21EI53
DEPT : ELECTRONICS AND INSTRUMENTATION ENGINEERING
COLLEGE : R N S INSTITUTE OF TECHNOLOGY
Module -1
ARM Embedded Systems [ PART-A ]
Introduction, RISC design philosophy, ARM design philosophy, Embedded system hardware
– AMBA bus protocol, ARM bus technology, Memory, Peripherals, Embedded system
software – Initialization (BOOT) code, Operating System, Applications.
MODULE-1 PART-A
The ARM processor core is a key component of many successful 32-bit embedded
systems. You probably own one yourself and may not even realize it! ARM cores are widely
used in mobile phones, handheld organizers, and a multitude of other everyday portable
consumer devices.
the ARM core is not a single core, but a whole family of designs sharing similar design
principles and a common instruction set.
For example, one of ARM’s most successful cores is the ARM7TDMI. It provides up to
120 Dhrystone MIPS and is known for its high code density and low power consumption,
making it ideal for mobile embedded devices.
(Dhrystone MIPS version 2.1 is a small benchmarking program.)
1. Instructions—
RISC processors have a reduced number of instruction classes. These classes provide
simple operations that can each execute in a single cycle.
4. Load-store architecture—
The processor operates on data held in registers.
Separate load and store instructions transfer data between the register bank and external
memory.
Memory accesses are costly, so separating memory accesses from data processing
provides an advantage because you can use data items held in the register bank multiple
times without needing multiple memory accesses.
These design rules allow a RISC processor to be simpler, and thus the core can operate
at higher clock frequencies.
[ However, point to be noted is ARM is neither pure RISC nor pure CISC (complex instruction
set computing]
Conditional execution—
An instruction is only executed when a specific condition has been satisfied. This
feature improves performance and code density by reducing branch instructions.
Enhanced instructions—
The enhanced digital signal processor (DSP) instructions were added to the
standard ARM instruction set to support fast 16×16-bit multiplier operations and
saturation. These instructions allow a faster-performing ARM processor in some cases to
replace the traditional combinations of a processor plus a DSP.
These additional features have made the ARM processor one of the most commonly used 32-bit
embedded processor cores. Many of the top semiconductor companies around the world produce
products based around the ARM processor.
Embedded systems can control many different devices, from small sensors found on a
production line, to the real-time control systems used on a NASA space probe. All these devices
use a combination of software and hardware components. Each component is chosen for
efficiency and, if applicable, is designed for future extension and expansion.
Figure 1.2 shows a typical embedded device based on an ARM core. Each box represents
a feature or function. The lines connecting the boxes are the buses carrying data. We can
separate the device into four main hardware components:
The ARM processor controls the embedded device. Different versions of the ARM
processor are available to suit the desired operating characteristics. An ARM processor
comprises a core (the execution engine that processes instructions and manipulates data)
plus the surrounding components that interface it with a bus. These components can
include memory management and caches.
Controllers coordinate important functional blocks of the system. Two commonly found
controllers are interrupt and memory controllers.
The peripherals provide all the input-output capability external to the chip and are
responsible for the uniqueness of the embedded device.
A bus is used to communicate between different parts of the device.
Embedded systems use different bus technologies than those designed for x86 PCs. The most
common PC bus technology, the Peripheral Component Interconnect (PCI) bus, connects
such devices as video cards and hard disk controllers to the x86 processor bus. This type
of technology is external or off-chip (i.e., the bus is designed to connect mechanically and
electrically to devices external to the chip) and is built into the motherboard of a PC.
In contrast, embedded devices use an on-chip bus that is internal to the chip and that
allows different peripheral devices to be interconnected with an ARM core.
There are two different classes of devices attached to the bus. The ARM processor core is
a bus master—a logical device capable of initiating a data transfer with another device across
the same bus. Peripherals tend to be bus slaves—logical devices capable only of responding
to a transfer request from a bus master device.
A bus has two architecture levels. The first is a physical level that covers the electrical
characteristics and bus width (16, 32, or 64 bits). The second level deals with protocol—the
logical rules that govern the communication between the processor and a peripheral.
The Advanced Microcontroller Bus Architecture (AMBA) has been widely adopted as the on-
chip bus architecture used for ARM processors.
GN SRIKANTH, EIEDEPT, RNSIT PAGE 5
ARM PROCESSOR NOTES CODE: 21EI53 5TH SEM (EIE) 2023-24
The first AMBA buses introduced were the ARM System Bus (ASB) and the ARM
Peripheral Bus (APB).
Later ARM introduced another bus design, called the ARM High Performance Bus
(AHB). Using AMBA,
peripheral designers can reuse the same design on multiple projects. because there are a
large number of peripherals developed with an AMBA interface, hardware designers
have a wide choice of tested and proven peripherals for use in a device.
AHB provides higher data throughput than ASB because it is based on a centralized
multiplexed bus scheme rather than the ASB bidirectional bus design. This change
allows the AHB bus to run at higher clock speeds and to be the first ARM bus to support
widths of 64 and 128 bits.
ARM has introduced two variations on the AHB bus: Multi-layer AHB and AHB-Lite. (In
contrast to the original AHB, which allows a single bus master to be active on the bus at any
time,)
The Multi-layer AHB bus allows multiple active bus masters. AHB-Lite is a subset of the AHB
bus and it is limited to a single bus master. This bus was developed for designs that do not
require the full features of the standard AHB bus.
AHB and Multi-layer AHB support the same protocol for master and slave but have
different interconnects. The new interconnects in Multi-layer AHB are good for systems with
multiple processors. They permit operations to occur in parallel and allow for higher throughput
rates.
The example device shown in Figure 1.2 has three buses: an AHB bus for the high-performance
peripherals, an APB bus for the slower peripherals, and a third bus for external peripherals,
proprietary to this device. This external bus requires a specialized bridge to connect with the
AHB bus.
1.3.3 Memory
An embedded system has to have some form of memory to store and execute code. You
have to compare
price,
performance,
power consumption
when deciding upon specific memory characteristics, such as hierarchy, width, and type. If
memory has to run twice as fast to maintain a desired bandwidth, then the memory power
requirement may be higher.
1.3.3.1 Hierarchy
All computer systems have memory arranged in some form of hierarchy. Figure 1.2 shows
a device that supports external off-chip memory. Internal to the processor there is an option
of a cache (not shown in Figure 1.2) to improve memory performance.
Figure 1.3 shows the memory trade-offs: the fastest memory cache is physically located
nearer the ARM processor core and the slowest secondary memory is set further away.
If the closer memory is to the processor core, the more it costs and the smaller its capacity. The
cache is placed between main memory and the core. It is used to speed up data transfer between
the processor and main memory. A cache provides an overall increase in performance but with a
loss of predictable execution time. Although the cache increases the general performance of the
system, it does not help real-time system response. Note that many small embedded systems do
not require the performance benefits of a cache.
The main memory is large—around 256 KB to 256 MB (or even greater), depending on the
application—and is generally stored in separate chips. Load and store instructions access the
main memory unless the values have been stored in the cache for fast access. Secondary storage
is the largest and slowest form of memory. Hard disk drives and CD-ROM drives are examples
of secondary storage. These days secondary storage may vary from 600 MB to 60 GB.
1.3.3.2 Width
The memory width is the number of bits the memory returns on each access—typically 8, 16, 32,
or 64 bits. The memory width has a direct effect on the overall performance and cost ratio.
If you have an uncached system using 32-bit ARM instructions and 16-bit-wide memory chips,
then the processor will have to make two memory fetches per instruction. Each fetch requires
two 16-bit loads. This obviously has the effect of reducing system performance, but the benefit is
that 16-bit memory is less expensive.
In contrast, if the core executes 16-bit Thumb instructions, it will achieve better performance
with a 16-bit memory. The higher performance is a result of the core making only a single fetch
to memory to load an instruction. Hence, using Thumb instructions with 16-bit-wide memory
devices provides both improved performance and reduced cost.
Table 1.1 summarizes theoretical cycle times on an ARM processor using different
memory width devices.
Flash ROM: can be written to as well as read, but it is slow to write so you shouldn’t use it for
holding dynamic data. Its main use is for holding the device firmware or storing long term data
that needs to be preserved after power is off. The erasing and writing of flash ROM are
completely software controlled with no additional hardware circuity required, which reduces the
manufacturing costs. Flash ROM has become the most popular of the read-only memory types
and is currently being used as an alternative for mass or secondary storage.
Dynamic random access memory (DRAM): is the most commonly used RAM for devices. It
has the lowest cost per megabyte compared with other types of RAM. DRAM is dynamic it
needs to have its storage cells refreshed and given a new electronic charge every few msec so
you need to set up a DRAM controller before using the memory.
Static random access memory (SRAM): is faster than the more traditional DRAM, but requires
more silicon area. SRAM is static—the RAM does not require refreshing. The access time for
SRAM is considerably shorter than the equivalent DRAM because SRAM does not require a
pause between data accesses. Because of its higher cost, it is used mostly for smaller high-speed
tasks, such as fast memory and caches.
Synchronous dynamic random access memory (SDRAM): is one of many sub categories of
DRAM. It can run at much higher clock speeds than conventional memory. SDRAM
synchronizes itself with the processor bus because it is clocked. Internally the data is fetched
from memory cells, pipelined, and finally brought out on the bus in a burst. The old-style DRAM
is asynchronous, so does not burst as efficiently as SDRAM.
1.3.4 Peripherals
The interaction of embedded systems with the outside world is possible only with peripheral
devices.
A peripheral device performs
input and output functions for the chip by connecting to other devices or sensors
that are off-chip.
Each peripheral device usually performs a single function and may reside on-
chip.
Peripherals range from a simple serial communication device to a more complex
802.11 wireless device.
All ARM peripherals are memory mapped—the programming interface is a set of
memory-addressed registers. The address of these registers is an offset from a specific peripheral
base address.
Controllers are specialized peripherals that implement higher levels of functionality
within an embedded system.
Two important types of controllers
1. Memory controllers
2. Interrupt controllers.
1.3.4.1 Memory Controllers
Memory controllers connect different types of memory to the processor bus. On
power-up a memory controller is configured in hardware to allow certain memory devices to be
active. These memory devices allow the initialization code to be executed.
Some memory devices must be set up by software; for example, when using DRAM, you
first have to set up the memory timings and refresh rate before it can be accessed.
1.3.4.2 Interrupt Controllers
When a peripheral or device requires attention, it raises an interrupt to the processor.
or set of devices. The interrupt handler determines which device requires servicing by reading a
device bitmap register in the interrupt controller.
The VIC is more powerful than the standard interrupt controller because it prioritizes
interrupts and simplifies the determination of which device caused the interrupt. After
associating a priority and a handler address with each interrupt, the VIC only asserts an interrupt
signal to the core if the priority of a new interrupt is higher than the currently executing interrupt
handler. Depending on its type, the VIC will either call the standard interrupt exception handler,
which can load the address of the handler for the device from the VIC, or cause the core to jump
to the handler for the device directly.
[The software components can run from ROM or RAM. ROM code that is fixed on the device
(for example, the initialization code) is called firmware.]
Initialization code (or boot code) takes the processor from the reset state to a state where
the operating system can run.
It configures
1) Memory controller
2) Processor caches
Booting an image is the final phase, but first you must load the image (s/w).
Loading an image involves copying an entire program including code and data into
RAM, to just copying a data area containing volatile variables into RAM.
Once booted, the system hands over control by modifying the program counter to point
into the start of the image.
Example: Initializing or organizing memory is an important part of the initialization code because
many operating systems expect a known memory layout before they can start.
Figure 1.5 shows memory before and after reorganization. It is common for ARM-based
embedded systems to provide for memory remapping because it allows the system to start the
initialization code from ROM at power-up. The initialization code then redefines or remaps the
memory map to place RAM at address 0x00000000—an important step because then the
exception vector table can be in RAM and thus can be reprogrammed.
First the initialization process prepares the hardware for an operating system to take control.
An operating system organizes the system resources: the peripherals, memory, and
processing time. Then these resources, they can be efficiently used by different
applications running within the operating system environment.
ARM processors support over 50 operating systems. The category of the OS are
1) Real-time operating systems (RTOSs)
GN SRIKANTH, EIEDEPT, RNSIT PAGE 10
ARM PROCESSOR NOTES CODE: 21EI53 5TH SEM (EIE) 2023-24
ARM has developed a set of processor cores that specifically target each category
1.4.3 Applications
The operating system schedules applications ( Giving CPU time slices for applications)—code
dedicated to handling a particular task. An application implements a processing task;
the operating system controls the environment.
An embedded system can have one active application or several applications running
simultaneously.
ARM processors in market segments:
Networking
Automotive
Mobile and consumer devices, mass storage, and imaging. Within each segment
1.5 Summary:
Pure RISC is aimed at high performance, but ARM uses a modified RISC design philosophy that also targets good
code density and low power consumption.
An embedded system consists of a processor core surrounded by caches, memory, and peripherals.
The system is controlled by operating system software that manages application tasks.
The key points in a RISC design philosophy are to improve performance by reducing
the complexity of instructions, to speed up instruction processing by using a pipeline, to
provide a large register set to store data near the core, and to use a load-store architecture.
The ARM design philosophy also incorporates some non-RISC ideas:
It allows variable cycle execution on certain instructions to save power, area, and code size.
[ EXTRA INFORMATION]
Module-1 [ PART-B ]
ARM core dataflow model, registers, current program status register, Pipeline, Exceptions,
Interrupts and Vector Table, Core extensions.
Show some core extensions that form an ARM processor. Core extensions speed up and
organize main memory as well as extend the instruction set.
Cover the revisions to the ARM core architecture by describing the ARM core naming
conventions used to identify them and the chronological changes to the ARM instruction
set architecture.
The final section introduces the architecture implementations by subdividing them into
specific
ARM processor core families.
A programmer can think of an ARM core as functional units connected by data buses, as shown
in Figure 2.1, where, the arrows represent the flow of data, the lines represent the buses, and the
boxes represent either an operation unit or a storage area. The figure shows not only the flow of
data but also the abstract components that make up an ARM core.
Data items are placed in the register file—a storage bank made up of 32-bit registers. Since the
ARM core is a 32-bit processor, most instructions treat the registers as holding signed or
unsigned 32-bit values. The sign extend hardware converts signed 8-bit and 16-bit numbers to
32-bit values as they are read from memory and placed in a register. ARM instructions typically
have two source registers, Rn and Rm, and a single result or destination register, Rd. Source
operands are read from the register file using the internal buses A and B, respectively.
The ALU (arithmetic logic unit) or MAC (multiply-accumulate unit) takes the register
values Rn and Rm from the A and B buses and computes a result. Data processing instructions
write the result in Rd directly to the register file. Load and store instructions use the ALU to
generate an address to be held in the address register and broadcast on the Address bus.
One important feature of the ARM is that register Rm alternatively can be pre-processed
in the barrel shifter before it enters the ALU. Together the barrel shifter and ALU can calculate a
wide range of expressions and addresses. After passing through the functional units, the result in
Rd is written back to the register file using the Result bus. For load and store instructions the
incrementor updates the address register before the core reads or writes the next register value
from or to the next sequential memory location. The processor continues executing instructions
until an exception or interrupt changes the normal execution flow.
The key components of the processor: the registers, the current program status register
(cpsr), and the pipeline.
2.1 Registers
used when executing applications. The processor can operate in seven different modes,
( totally seven modes ).
All the registers shown are 32 bits in size.
There are up to 18 active registers:
16 data registers
2 processor status registers.
The data registers are visible to the programmer as r0 to r15.
The ARM processor has three registers assigned to a particular task or special function:
r13, r14, and r15. They are frequently given different labels to differentiate them from the
other registers.
Register r13 is traditionally used as the stack pointer (sp) and stores the head of the
stack in the current processor mode
Register r14 is called the link register (lr) and is where the core puts the return
address whenever it calls a subroutine
Register r15 is the program counter (pc) and contains the address of the next
instruction to be fetched by the processor.
Depending upon the context, registers r13 and r14 can also be used as general-
purpose
registers, which can be particularly useful since these registers are banked during a
processor mode change.
In ARM state the registers r0 to r13 are orthogonal any instruction that you can
apply to r0 you can equally well apply to any of the other registers.
However, there are instructions that treat r14 and r15 in a special way.
In addition to the 16 data registers, there are two program status registers: cpsr and
spsr (the current and saved program status registers, respectively).
The register file contains all the registers available to a programmer. Which
registers are visible to the programmer depend upon the current mode of the
processor.
The ARM core uses the cpsr to monitor and control internal operations.
The cpsr is a dedicated 32-bit register and resides in the register file.
Figure 2.3 shows the basic layout of a generic program status register.
The cpsr is divided into four fields, each 8 bits wide: flags, status, extension, and control.
In current designs the extension and status fields are reserved for future use.
The control field contains the processor mode, state, and interrupt mask bits.
The flags field contains the condition flags.
Some ARM processor cores have extra bits allocated. For example, the J bit, which can
be found in the flags field, is only available on Jazelle-enabled processors, which execute
Processor Modes:
The processor mode determines which registers are active and the access rights to the
cpsr
register itself.
Each processor mode is either privileged or nonprivileged:
A privileged mode allows full read-write access to the cpsr. Conversely, a nonprivileged
mode only allows read access to the control field in the cpsr but still allows read-write
access to the condition flags.
There are seven processor modes in total: six privileged modes (abort, fast interrupt
request, interrupt request, supervisor, system, and undefined) and one nonprivileged mode
(user).
The processor enters abort mode when there is a failed attempt to access memory.
Fast interrupt request and interrupt request modes correspond to the two interrupt levels
available
on the ARM processor.
Supervisor mode is the mode that the processor is in after reset and is generally the mode
that an operating system kernel operates in.
System mode is a special version of user mode that allows full read-write access to the
cpsr.
Undefined mode is used when the processor encounters an instruction that is undefined or
not supported by the implementation.
User mode is used for programs and applications.
Another important feature to note is that the cpsr is not copied into the spsr when a mode change is forced due to a
program writing directly to the cpsr. The saving of the cpsr only occurs when an exception or interrupt is raised.
Figure 2.3 shows that the current active processor mode occupies the five least significant bits of the cpsr. When
power is applied to the core, it starts in supervisor mode, which is privileged. Starting in a privileged mode is
useful since initialization code can use full access to the cpsr to set up the stacks for each of the other modes.
Table 2.1 lists the various modes and the associated binary patterns. The last column of the table gives the bit
patterns that represent each of the processor modes in the cpsr.