Arab International University
Microcontroller
and Embedded
Systems
2019/2020
Dr Tarek Barhoum ١
characteristics of embedded systems
Single-functioned
Executes a single program, repeatedly
Tightly-constrained
Low cost, low power, small, fast, etc.
Reactive and real-time
Continually reacts to changes in the system’’s
environment
Must compute certain results in real-time without delay
2
An embedded system example
a digital camera
Digital camera chip
CCD
CCD preprocessor Pixel coprocessor D2A
A2D
lens
JPEG codec Microcontroller Multiplier/Accum
DMA controller Display ctrl
Memory controller ISA bus interface UART LCD ctrl
Single-functioned -- always a digital camera
Tightly-constrained -- Low cost, low power, small, fast
Reactive and real-time – to environment changes
3
Design challenge – optimizing
design metrics
Common metrics
Unit cost: the monetary cost of manufacturing each copy of the
system, excluding NRE cost
NRE cost (Non-Recurrent Engineering cost): The one- time
monetary cost of designing the system
Size: the physical space required by the system
Performance: the execution time or throughput of the system
Power: the amount of power consumed by the system
Flexibility: the ability to change the functionality of the system
without incurring heavy NRE cost
4
Design challenge – optimizing
design metrics
Common metrics (continued)
Time-to-prototype: the time needed to build a working
version of the system
Time-to-market: the time required to develop a system
to the point that it can be released and sold to customers
Maintainability: the ability to modify the system
after its initial release
Correctness, safety, many more
5
Three key embedded system
technologies
Technology
A manner of accomplishing a task, especially using technical
processes, methods, or knowledge
Three key technologies for embedded system:
Processor technology
IC technology
Design technology
6
General-purpose processors
Single-purpose processors
Application-specific processors
IC technology
The manner in which a digital (gate-level)
implementation is mapped onto an IC
IC: Integrated circuit, or “chip”
IC technologies differ in their customization to a
design
IC’’s consist of numerous layers (perhaps 10 or more)
IC technologies differ with respect to who builds each layer and when
The minimum length of a transistor characterizes IC technology
gate
IC package IC oxide
source chan
nel Silicon substrate
drain 10
IC technology
4 types of IC technologies
Full-custom VLSI
Semi-custom ASIC (gate array and standard cell)
PLD (Programmable Logic Device)
FPGA (Field Programmable Gate Array)
Ful -custom/VLS
All layers are optimized for an embedded system's
particular digital implementation
Placing transistors
Sizing transistors
Routing wires
Benefits
Excellent performance, small size, low power
Drawbacks
High NRE cost (e.g., $300k), long time-to-market
Semi-custom
Lower layers are fully or partially built
Designers are left with routing of wires and maybe placing
some blocks
Benefits
Good performance, good size, less NRE cost than a full-
custom implementation (perhaps $10k to $100k)
Drawbacks
Still require weeks to months to develop
CPLD/FPGA
All layers already exist
Designers can purchase an IC
Connections on the IC are either created or destroyed
to implement desired functionality
Benefits
Low NRE costs, almost instant IC availability
Drawbacks
Bigger, expensive (perhaps $30 per unit), power
hungry, slower
Processo
r Digital circuit that performs a computation tasks
Controller and datapath
General-purpose: variety of computation tasks Digital camera chip
CCD
Custom single-purpose: non-standard task
CCD Pixel coprocessor D2A
A2D preprocessor
lens
A custom single-purpose
processor may be (cf JPEG codec Microcontroller Multiplier/Accum
CMEN5500- advanced digital DMA controller Display
design course) ctrl
Fast, small, low power
But, high NRE, longer time-to-market,
less flexible Memory controller ISA bus interface UART LCD ctrl
٢
Introductio
General-PurposenProcessor
Processor designed for a variety of computation tasks
Low unit cost, in part because manufacturer spreads NRE over
large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in 1996 alone
Carefully designed since higher NRE is acceptable
Can yield good performance, size and power
Low NRE cost, short time-to-market/prototype, high
flexibility
User just writes software; no processor design
٣
Microprocessor/Microcontroller
Architecture
Basic computer : Von Neuman
1945
contains
Program and Data memory CU
Central unit (CU)
Program sequentially Address Data
executed
Memory
Can be redirected (test)
٤
Microprocessor
CU
Processing Unit
Control ALU
Unit Register
file
Bus
address data
ROM RAM
Memory
program data
٥
Basic concepts
Bus
Fixed number of wires
Connecting different blocs
Inside and outside the processor @
mono-directional as Address Data
Bus
bidirectional as Data Bus
One transmitter
One or more receivers
٦
Basic concepts
Memories
Storage places
Temporary : RAM
Static SRAM MEMORY
Dynamic DRAM
Permanent
Factory programmed :
ROM
PROM
EPROM
EEPROM
Serial
Parallel
٧
Basic concepts
Program
Set of Instructions OpCod n
Ins
Stored in program memory n
Sequentially executed
Ins OpCod n+1
Binary coded on one (or more) words n+1
Operation Code(... ،- ،+)
Operands Program
Code length depends on
instruction and addressing mode Memory
Program memory width is
word width
٨
General-purpose processor
more detailed
Control unitarchitecture Processor
and datapath Control unit Datapath
Note similarity ALU
to single- Controller Control
purpose /Status
processor (all Registers
hardwired)
Key differences
Datapath is PC IR
general
Control unit I/O
doesn’t store the Memory
algorithm
– the algorithm is
“programmed”
into the memory ٩
Clocking concepts
Frequency
External : crystal
frequency
Internal : Machine Cycle
One divisor of
external frequency
Why? processor
Instruction execution crystal
steps
Fetch
Decode
Execute
Store
١٠
Datapath
Load Operations Processor
Read memory Control unit Datapath
location into ALU
register Controller Control +1
/Status
ALU operation
Input certain Registers
registers through
ALU, store back 10 11
in register PC IR
Store
Write register to I/O
memory Memory
...
10
location .1 1
.
.
١١
Control
Control unit: configuresUnit
the datapath operations Processor
Sequence of desired operations Control unit Datapath
(“instructions”) stored in
memory ALU
Controller
– “program” Control
/Status
Instruction cycle – broken into
several sub-operations, each Registers
one clockGet
Fetch: cycle,
nexte.g.:
instruction
into IR
Decode: Determine what
the instruction means PC IR R0 R1
Fetch operands: Move data
from memory to datapath
register I/O
Execute: Move data through 100 load R0, M[500] Memory
...
the ALU 500 10
Store results: Write data
101 inc R1, R0
501 ...
102 store M[501], R1
from register to memory
١٢
Instruction execution steps
S
fetch
decode execute
yes Operands? yes
Results?
fetch operands
no
no store
Next
١٣
Control Unit Sub-
Operations
Fetch Processor
Control unit Datapath
Get next ALU
instruction Controller Control
/Status
into IR
Registers
PC: program
counter, PC 100 IR R0 R1
always points load R0, M[500]
to next I/O
instruction 100 load R0, M[500] Memory
...
500 10
101 inc R1, R0 501 ...
IR:
fetche
holds the 102 store M[501], R1
dinstructio ١٤
Control Unit Sub-
Operations
Decode Processor
Control unit Datapath
Determine ALU
what the Controller Control
/Status
instruction
Registers
means
PC 100 IR R0 R1
load R0, M[500]
I/O
100 load R0, M[500] Memory
...
500 10
101 inc R1, R0
501 ...
102 store M[501], R1
١٥
Control Unit Sub-
Operations
Fetch operands Processor
Control unit Datapath
Move data ALU
from memory Controller Control
/Status
to datapath
Registers
register
10
PC 100 IR R0 R1
load R0, M[500]
I/O
100 load R0, M[500] Memory
...
500 10
101 inc R1, R0
501 ...
102 store M[501], R1
١٦
Control Unit Sub-
Operations
Execute Processor
Control unit Datapath
Move data ALU
through the Controller Control
/Status
ALU
Registers
This particular
instruction PC 100 IR R0
10
R1
does nothing load R0, M[500]
during this I/O
sub- operation 100 load R0, M[500] Memory
...
500 10
101 inc R1, R0
501 ...
102 store M[501], R1
١٧
Control Unit Sub-
Operations
Store results Processor
Control unit Datapath
Write data ALU
from register Controller Control
/Status
to memory
Registers
This particular
instruction PC 100 IR R0
10
R1
does nothing
load R0, M[500]
during this I/O
sub- operation 100 load R0, M[500] Memory
...
500 10
101 inc R1, R0
501 ...
102 store M[501], R1
١٨
Instruction
Cycles
PC=100 Processor
Fetch Decode Fetch Exec. Store Control unit Datapath
ops results ALU
clk Controller Control
/Status
Registers
10
PC 100 IRM[500] R0 R1
load R0,
I/O
100 load R0, M[500] Memory
...
500 10
101 inc R1, R0 501 ...
102 store M[501], R1
١٩
Instruction
Cycles
PC=100 Processor
Fetch Decode Fetch Exec. Store Control unit Datapath
ops results ALU
clk Controller Control +1
/Status
PC=101
Fetch Decode Fetch Exec. Store Registers
ops results
clk
10 11
PC 101 IR R0 R0 R1
inc R1,
I/O
100 load R0, M[500] Memory
...
500 10
101 inc R1, R0 501 ...
102 store M[501], R1
٢٠
Instruction
Cycles
PC=100 Processor
Fetch Decode Fetch Exec. Store Control unit Datapath
ops results ALU
clk Controller Control
/Status
PC=101
Fetch Decode Fetch Exec. Store Registers
ops results
clk
10 11
PC 102 IR R0 R1
store M[501], R1
PC=102
Fetch Decode Fetch Exec. Store
I/O
ops results ...
100 load R0, M[500] Memory
clk 500 10
101 inc R1, R0 501 ..
1 1
102 store M[501], R1 .
٢١
Architectural
Considerations
N-bit processor Processor
N-bit ALU, Control unit Datapath
ALU
registers, buses, Controller Control
memory data /Status
interface Registers
Embedded: 8-bit,
16- bit, 32-bit
PC IR
common
Desktop/servers:
I/O
32- bit, even 64 Memory
PC size
determines
address space ٢٢
Architectural
Considerations
Clock frequency Processor
Inverse of Control unit Datapath
ALU
clock period Controller Control
/Status
Must be longer
than longest Registers
register to
register delay in PC IR
entire processor
Memory access I/O
is often the Memory
longest
٢٣
Pipelining: Increasing
Instruction
Wash 1 2 3 4 5 6 7
Throughput
8 1 2 3 4 5 6 7 8
Non-pipelined Pipelined
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
non-pipelined dish cleaning pipelined dish cleaning
Time Time
Fetch-instr. 1 2 3 4 5 6 7 8
Decode 1 2 3 4 5 6 7 8
Fetch ops. 1 2 3 4 5 6 7 8 Pipelined
Execute 1 2 3 4 5 6 7 8
Instruction 1
Store res. 1 2 3 4 5 6 7 8
Time
pipelined instruction execution
٢٤
Two Memory
Architectures
Processor Processor
Princeton
Fewer
memory
wires
(Von Newman
machine) Program Data memory Memory
(program and data)
Harvard memory
Separate buses
for data and Harvard Princeton
program
Simultaneous
program and
data memory
access
٢٥
Cache
Memory
Memory access may Fast/expensive technology, usually on
the same chip
be slow Processor
Cache is small but
fast memory close to Cache
processor
Holds copy of part
Memory
of memory
Hits and misses Slower/cheaper technology, usually on
a different chip
٢٦
Application-Specific
Instruction- Set Processors
(ASIPs)
General-purpose processors
Sometimes too general to be effective in
demanding application
e.g., video processing – requires huge video buffers and
operations on large arrays of data, inefficient on a GPP
But single-purpose processor has high NRE,
not programmable
ASIPs – targeted to a particular domain
Contain architectural features specific to that
domain
e.g., embedded control, digital signal processing, video
processing, network processing, telecommunications, etc.
Still programmable
٢٧
A Common ASIP:
Microcontroller
For embedded control applications
Reading sensors, setting actuators
Mostly dealing with events (bits): data is present, but not in huge
amounts
e.g., VCR, disk drive, digital camera (assuming SPP for
image compression), washing machine, microwave oven
Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc.
Tightly integrated for programmer, typically part of register
space
On-chip program and data memory
Direct programmer access to many of the chip’s pins
Specialized instructions for bit-manipulation and other low-
level operations
٢٨
Peripheral
s
Modules that help
microprocessor
to
communicate with
the outside world ADC Decoder
Independent or not
Includes
world
processor Timer I/O
I/O units
Timers
Addressing units UART USART
USART
A/D converters
…
٢٩
Microprocessor vs.
Microcontroller
Microprocessor microprocessor
CU
Processing unit Processing unit
Control unit Control ALU
unit reg
Microcontroller
microprocessor
address data
data and program
Memory
memory
program
I/O units data
timers
Timers / UART / ADC ...
USART/UART
microcontroller
…
٣٠
Example : The AT
90S1200
• 0.8 um
• 2 layer Alu
• 24 mm2
٣١
Another Common ASIP:
Digital Signal Processors
(DSP)
For signal processing applications
Large amounts of digitized data, often streaming
Data transformations must be applied fast
e.g., cell-phone voice filter, digital TV, music
synthesizer
DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction, other
instrs.
Efficient vector operations – e.g., add two arrays
Vector ALUs, loop buffers, etc.
٣٢
Processor
Classification
Internal data bus width
8 bits (8085, 89c51, 90s1200…)
16 bits (8086…)
32 bits (80486…)
Architecture and instruction set
CISC (89c51, 8085, 8086, 80486, 68000...)
RISC (90s1200, 12c508, 16c54…)
Limited number of instructions => faster
execution
Superscalar (TMS32cxxx...)
Parallel processing : several instructions at a time ٣٣
Processor
Classification
Utilization
General purpose : p
Processing unit + control unit
Control : c
p + memories +
peripherals
Digital signal processing :
DSP
Special ALU
Floating-point multiplier
memories
DMA ٣٤
Processor Examples
Processor Clock speed Periph. Bus Width MIPS Power Trans. Price
General Purpose Processors
Intel PIII 1GHz 2x16 K 32 ~900 97W ~7M $900
L1, 256K
L2, MMX
IBM 550 MHz 2x32 K 32/64 ~1300 5W ~7M $900
PowerPC L1, 256K
750X L2
MIPS 250 MHz 2x32 K 32/64 NA NA 3.6M NA
R5000 2 way set assoc.
StrongARM 233 MHz None 32 268 1W 2.1M NA
SA-110
Microcontroller
Intel 12 MHz 4K ROM, 128 8 ~1 ~0.2W ~10K $7
RAM, 32 I/O,
8051 Timer, UART
Motorola 3 MHz 4K ROM, 192 8 ~.5 ~0.1W ~10K $5
RAM, 32 I/O,
68HC811 Timer, WDT, SPI
Digital Signal Processors
TI C5416 160 MHz 128K, SRAM, 3 T1 16/32 ~600 NA NA $34
Ports, DMA, 13
ADC, 9 DAC
Lucent 80 MHz 16K Inst., 2K Data, 32 40 NA NA $75
DSP32C Serial Ports, DMA
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
٣٥
Summar
General-purposeyprocessors
Good performance, low NRE, flexible
Controller, datapath, and memory
ASIPs
Microcontrollers, DSPs, network processors,
more customized ASIPs
Choosing among processors is an
important step
٣٦