Design Technology
BY
HASSAN AL MANASRAH
TAMIR AL ZU’BI
Outline
2
Introduction
Automation: synthesis
Verification: hardware/software co-simulation
Reuse: intellectual property cores
Design process models
Introduction
3
System Design Goals
Introduction
4
What does “Design” means?
Task of defining system functionality and converting that functionality into
physical implementation.
Convert functionality to physical implementation while
Satisfying constrained metrics
Optimizing other design metrics
Designing embedded systems is hard because of
Complex functionality
Millions of possible environment scenarios. Ex: Elevator Controller.
So many Competing, tightly constrained metrics.
Productivity gap
As low as 10 lines of code or 100 transistors produced per day
Many possible combinations of buttons being pressed.
Improving productivity
5
Design technologies developed to improve productivity, we focus on
technologies advancing hardware / software view:
Automation: Synthesis
Computer program to replace manual design.
Which made Hardware design look like Software design.
Reuse
Process of using predesigned components.
Core in the Hardware domain.
Verification
Task of ensuring correctness/completeness of each design step.
Hardware/Software co-simulation. Specification
Automation
Verification Reuse
Implementation
Automation: synthesis
6
The parallel evolution of compilation and synthesis
Synthesis levels
Logic synthesis
Two-level logic minimization
Multi-level logic minimization
FSM synthesis
Technology mapping
Register-transfer synthesis
Behavioral synthesis
System synthesis and hardware/software co-design
The parallel evolution of compilation and synthesis
7
In the early design was mostly
hardware, software was fairly The co-design ladder
simple.
Sequential program code (e.g., C, VHDL)
Software complexity increased
with advent of general-purpose Behavioral synthesis
(1990s)
processor. Compilers
(1960s,1970s)
Different techniques for software Register transfers
design and hardware design: Assembly instructions
RT synthesis
(1980s, 1990s)
Caused division of the two fields
Logic equations / FSM's
Hardware/software design fields Assemblers, linkers
(1950s, 1960s) Logic synthesis
rejoining (1970s, 1980s)
Both can start from behavioral Machine instructions Logic gates
description in sequential program
Implementation
model Microprocessor plus
program bits
VLSI, ASIC, or PLD
implementation
Cont.
8
Software design evolution
Machine instructions
The co-design ladder
Collection machine instructions called
Program (0’s, 1’s).
Sequential program code (e.g., C, VHDL)
Assemblers
Convert assembly programs into machine
instructions, due to hard dealing with Behavioral synthesis
huge number of 0’s, 1’s. (1990s)
Compilers
Compilers (1960s,1970s)
translate sequential programs into Register transfers
assembly
Hardware design evolution RT synthesis
Assembly instructions (1980s, 1990s)
Interconnected logic gates
Logic synthesis
Logic equations / FSM's
converts logic equations or FSMs into
gates Assemblers, linkers
(1950s, 1960s) Logic synthesis
Register-transfer (RT) synthesis (1970s, 1980s)
converts FSMDs into FSMs, logic
equations, predesigned RT components
(registers, adders, etc.) Machine instructions Logic gates
Behavioral synthesis
converts sequential programs into Microprocessor plus Implementation VLSI, ASIC, or PLD
FSMDs program bits implementation
Hardware design involves many more dimensions, while compilers must generate assembly instructions to implement itself.
Hardware Designer concerned about size, power, performance and other metrics.
Synthesis Levels
9
Gajski’s Y-chart
Each axis represents type of Carry-ripple adder Addition
description Structural Behavior
Behavioral Processors, memories Sequential programs
Defines outputs as function of inputs
Structural Registers, FUs, MUXs Register transfers
Implements behavior by connecting
components with known behavior Gates, flip-flops Logic equations/FSM
Physical
Gives size/locations of components and Transistors Transfer functions
wires on chip/board
Synthesis converts behavior at given
Cell Layout
level to structure at same level or lower
E.g., Modules
FSM → gates, flip-flops (same level)
Chips
FSM → transistors (lower level)
FSM X registers, FUs (higher level) Boards
FSM X processors, memories (higher
level) Physical
Logic Synthesis
10
Converting logic-level behavior to structural implementation
By converting Logic equations and/or FSM to connected gates.
Combinational logic synthesis
Two-level minimization
Multilevel minimization
FSM synthesis
State minimization
State encoding
Two-level minimization
11
Represent logic function as sum of products (or product of sums)
AND gate for each product
OR gate for each sum
Sum of products
This minimization gives best possible performance
F = abc'd' + a'b'cd + a'bcd + ab'cd
when at most we have 2 gates delay
Direct implementation
Goal: minimize size
a
Minimum cover b
Minimum cover that is prime c F
d
4 4-input AND gates and
1 4-input OR gate
→ 40 transistors
Minimum Cover
12
Minimum # of AND gates (sum of products)
Literal: variable or its complement
a or a’, b or b’, etc.
Minterm: product of literals
Each literal appears exactly once
abc’d’, ab’cd, a’bcd, etc.
Implicant: product of literals
Each literal appears no more than once
abc’d’, a’cd, etc.
Covers 1 or more minterms
a’cd covers a’bcd and a’b’cd
Cover: set of implicants that covers all minterms of function
Minimum cover: cover with minimum # of implicants
Cont.
13
Minimum cover: K-map approach
K-map: sum of products K-map: minimum cover
cd cd
Karnaugh map (K-map) ab 00 01 11 10 ab 00 01 11 10
00
0 0 1 0 00
0 0 1 0
1 represents minterm 01
0 0 1 0 01
0 0 1 0
Circle represents implicant
11
1 0 0 0 11
1 0 0 0
10
0 0 1 0 10
0 0 1 0
Minimum cover Minimum cover
Covering all 1’s with min # of circles F=abc'd' + a'cd + ab'cd
Example: direct vs. min cover Minimum cover implementation
Less gates a
4 vs. 5 b 2 4-input AND gate
c F 1 3-input AND gates
Less transistors 1 4 input OR gate
→ 28 transistors
28 vs. 40 d
Minimum cover that is prime
14
Minimum # of inputs to AND gates
Prime implicant K-map: minimum cover that is prime
cd
Implicant not covered by any other implicant ab 00 01 11 10
Max-sized circle in K-map
00
0 0 1 0
01
0 0 1 0
Minimum cover that is prime 11
1 0 0 0
10
0 0 1 0
Covering with min # of prime implicants
Minimum cover that is prime
Min # of max-sized circles
F=abc'd' + a'cd + b'cd
Example: prime cover vs. min cover
Implementation
Same # of gates
a 1 4-input AND gate
4 vs. 4
b 2 3-input AND
Less transistors c F gates
1 4 input OR gate
26 vs. 28 → 26 transistors
d
Minimum cover: heuristics
15
K-maps give optimal solution every time
Functions with > 6 inputs too complicated
Use computer-based tabular method
Finds all prime implicants
Finds min cover that is prime
Also optimal solution every time
Problem: 2n minterms for n inputs
32 inputs = 4 billion minterms
Exponential complexity
Heuristic
Solution technique where optimal solution not guaranteed
Hopefully comes close
Heuristics: iterative improvement
16
Start with initial solution
i.e., original logic equation
Repeatedly make modifications toward better solution
Common modifications
Expand
Replace each nonprime implicant with a prime implicant covering it
Delete all implicants covered by new prime implicant
Reduce
Opposite of expand
Reshape
Expands one implicant while reducing another
Maintains total # of implicants
Irredundant
Selects min # of implicants that cover from existing implicants
Synthesis tools differ in modifications used and the order they are used
Multilevel logic minimization
17
Trade performance for size
Increase delay for lower # of gates
Gray area represents all possible solutions
Circle with X represents ideal solution
Generally not possible .
im
in
2-level gives best performance e lm
-l ev
ti
max delay = 2 gates ul
delay
m
Solve for smallest size
Multilevel gives pareto-optimal solution
2-level minim.
Minimum delay for a given size
size
Minimum size for a given delay
Example
18
Minimized 2-level logic function:
F = adef + bdef + cdef + gh 2-level minimized
a
Requires 5 gates with 18 total gate inputs
d
4 ANDS and 1 OR b
After algebraic manipulation: e F
c
F = (a + b + c)def + gh f
Requires only 4 gates with 11 total gate inputs g
2 ANDS and 2 ORs h
Less inputs per gate
multilevel minimized
Assume gate inputs = 2 transistors a
Reduced by 14 transistors b
c
36 (18 * 2) down to 22 (11 * 2)
d
Sacrifices performance for size e
f
Inputs a, b, and c now have 3-gate delay g F
Iterative improvement heuristic commonly h
used
FSM synthesis
19
Converting FSM to gates
State minimization
Reduce # of states
Identify and merge equivalent states Smaller states registers and fewer gates
Outputs, next states same for all possible inputs.
Tabular method gives exact solution.
• Table of all possible state pairs.
• If n states, n2 table entries.
• heuristics used with large # of states.
State encoding
Unique bit sequence for each state.
If n states, log2(n) bits to represent n unique encodings.
n! possible encodings.
Thus, heuristics common.
Technology mapping
20
Library of gates available for implementation
Simple
only 2-input AND,OR gates
Complex
various-input AND,OR,NAND,NOR,etc. gates
Efficiently implemented meta-gates (i.e., AND-OR-INVERT,MUX)
Final structure consists of specified library’s components only
If technology mapping integrated with logic synthesis
More efficient circuit
More complex problem
Register-transfer synthesis
21
Converts FSMD to custom single-purpose processor
Datapath
Register units to store variables
Complex data types
Functional units
Arithmetic operations
Connection units
Buses, MUXs
FSM controller
Controls datapath
Key sub problems:
Allocation
Instantiate storage, functional, connection units
Binding
Mapping FSMD operations to specific units
Behavioral synthesis
22
High-level synthesis
Converts single sequential program to single-purpose processor
FSDM Does not require the program to schedule states
Behavioral synthesis tool use advance techniques to carry out task
scheduling allocation.
Key sub problems
Allocation Implementing a sequential program needs
Binding
Scheduling
Assign sequential program’s operations to states
Optimizations important
Compiler
Constant propagation, dead-code elimination, loop unrolling
Advanced techniques for allocation, binding, scheduling
System synthesis
Collection of processors 23
At embedded systems its getting much complex
Multiple processes may provide better performance/power
May be better described using concurrent sequential programs
System synthesis means: Convert 1 or more processes into 1 or more
processors
Tasks
Transformation
Can merge 2 exclusive processes into 1 process
Can break 1 large process into separate processes
Allocation
Essentially design of system architecture
Select processors to implement processes
Also select memories and busses
Cont.
24
Tasks (cont.)
Partitioning
Mapping 1 or more processes to 1 or more processors
Variables among memories
Communications among buses
Scheduling
Determining when each of the multiple processes on a single processor will have
chance to execute on the processor.
Memory accesses, bus communications must be schedule.
Tasks performed in variety of orders
Iteration among tasks common
.Cont
25
Synthesis driven by constraints
E.g.,
Meet performance requirements at minimum cost
Allocate as much behavior as possible to general-purpose processor
• Low-cost/flexible implementation
Minimum # of SPPs used to meet performance
System synthesis for GPP only (software)
Common for decades
Multiprocessing
Parallel processing
Real-time scheduling
Hardware/software codesign
Simultaneous consideration of GPPs/SPPs during synthesis
Made possible by maturation of behavioral synthesis in 1990’s
26
Verification
Verification
27
It is the task of ensuring that a design is correct and
complete.
o Correctness
Means that the design implements its specification correctly.
o Completeness
Means that the designs specification described appropriate output responses to all
relevant input sequences.
There are two main verification approaches
Formal verification
Simulation
Formal Verification
28
It is an approach of analyzing a design to prove or disprove certain properties.
This is done by verifying the correctness of a particular design & verifying the
completeness of a behavioral description.
Correctness verification
By verifying that a particular structural description correctly implements a behavioral description, by proving the
equivalence of the two descriptions.
Example:
Prove ALU structural implementation equivalent to behavioral description.
1. Derive Boolean equations for outputs.
2. Create truth table for equations.
3. Compare to truth table from original behavior table.
completeness verification
Verifying completeness of a behavioral verification is proving of that a certain situations will never occur.
Example:
Formally prove elevator door can never open while elevator is moving
1. Derive conditions for door being open.
2. Show conditions conflict with conditions for elevator moving.
Drawbacks:
Formal Verification is very hard
limited to small designs or verifying only certain key properties
Simulation
29
It is an approach in which we create a model of the design that can be executed on computer
We entered the input values to the module and check that the output values of the module
match the expected values.
Correctness verification
Example :
Prove ALU structural implementation equivalent to behavioral description.
1. Providing all possible input combinations to the module
2. Checking the ALU outputs for correct results
completeness verification
Example :
Formally prove Elevator door closed when moving
1. Provide all possible input sequences
2. Check door always closed when elevator moving
Simulation of all possible inputs is impossible, like simulating of all possible inputs for 32-bits
ALU ,which requires 232*232 possible input combinations which take a very long time to
simulate.
Designer can only simulate a tiny subset of possible inputs, which includes typical values ,and
boundary inputs.
Simulation increases confidence of correctness/completeness of the design but Does not prove
anything.
Simulation advantages & disadvantages
30
Simulation has several advantages over the physical implementation with respect to test &
debugging the system.
o Controllability
The ability to control the execution of the system, like the control of time and the data inputs of the system.
o Observability
the ability to examine system values, that the user can stop the simulation and observe internal system values.
o Debugging
the user can stop the simulation at any time ,either small ,and change the input values or the internal values or the
environment values, then restarting again.
o Setting up time
Simulation takes a less setting up time than physical implementation, and gives the ability to test the system and check
the output before setting up the system in hardware.
Simulation has disadvantages
o Set up simulation take much time for a complex external environment.
o The models of the environment likely is incomplete ,so environment behavioral may be not modeled correctly.
o Simulation speed is slower than physical implementation speed.
…Cont
31
The most significant disadvantage is simulation speed
e.g.. physical implementation of microprocessor may executes 100 million instruction per second, a simulation of gate
level model may execute only 10 instruction per second…big gap!!!
Simulation is slow for many reasons:
Sequentializing parallel design
Supposing that we are analyzing 1000000 logic gates in a design ,all this gates operate in parallel ,so we have inputs ,
outputs for each gate, every gate is simulated per a time.
Several programs added between simulated system and real hardware
The simulation has to understand the system ,takes the input , calculates ,then generates the output ,all of this take a
time, additionally the simulation is running under OS which may make a delay.
Overcome of slow simulation speed
o Reducing the amount of real time simulation
Instead of using hours of simulation we might use a milliseconds of simulation
o Using faster simulator
There are two ways to make simulator faster
Building & Using special hardware for simulation, known as Emulators.
Using simulator which is less precise and accurate, by reducing controllability and observability.
…Cont
32
Don’t need gate-level analysis for all
simulations
E.g., cruise control
Don’t care what happens at every
input/output of each logic gate
Simulating RT components ~10x faster
1 IC 1 hour
Cycle-based simulation ~100x faster
Accurate at clock boundaries only 10 FPGA 1 day
No information on signal changes
100 4 days
between boundaries hardware emulation
Faster simulator often combined with 1000 throughput model 1.4 months
reduction in real time
If willing to simulate for 10 hours 10000 instruction-set simulation 1.2 years
Use instruction-set simulator
100,000 cycle-accurate simulation 12 years
Real execution time simulated
1,000,000 register-transfer-level HDL simulation >1 lifetime
10 hours * 1 / 10,000
= 0.001 hour 10,000,000 gate-level HDL simulation 1
= 3.6 seconds
millennium
Hardware/software co-simulation
33
It is a simulator that is designed to hide the details of integration of an ISS and HDL
simulator.
There are many simulation approaches varying in speed ,precision ,and accuracy.
You may find a very detailed simulation like gate-level mode ,and very abstract simulation
like instruction level model.
Simulation tools evolved separately for hardware/software ,so every one has separate design
evolution.
Software Global Purpose Processor(GPP)
Typically with instruction-set simulator (ISS)
Hardware Special Purpose Processor(SPP)
Typically with models in HDL environment
The integration of GPP & SPP onto a single IC increased the need of simulating these two
processors together, by merging the Software/Hardware simulation tools.
There are two approaches to merge Software & Hardware simulation together
o The Simple way is to create an HDL module for the GPP which will run the software of the system,
and then integrating the HDL model of the SPP, it has two disadvantages:
Much slower than ISS
Less observable/controllable than ISS
o Creating communication between GPP (ISS) & SPP(HDL) ,that every one run alone at its simulation
and transferred data between them by shared communication when needed, this is known as
Hardware/Software Co-Simulation.
…Cont
34
Modern Hardware/Software co-simulations additionally to integrating two simulators,
they minimize the communication between two simulator.
E.g. the memory between GPP & SPP every processor has to access the memory
Where should memory go?
In ISS
HDL simulator must stall for memory access
In HDL?
ISS must stall when fetching each instruction
The solution is to model a independent memory for every processor in ISS simulator and
HDL simulator with updating the shared data for both.
Huge speedups (100x or more) reported with this technique.
Emulators
35
It is general physical device onto which a system can be mapped relatively quickly, and
can be placed in the system real environment.
It is created to solve the problems of simulation ,expensive environment setup,
incomplete environment models, and slow simulation speed.
An emulator consists of microprocessor IC and monitoring &controlling circuits.
It may contain tens or hundreds of FPGAs ,and Usually supports debugging tasks
Emulation has several advantages over simulation:
Mapped relatively quickly
Hours, days
Can be placed in real environment
No environment setup time
No incomplete environment
Typically faster than simulation
Hardware implementation
…Cont
36
Emulation has also disadvantages:
o Still not as fast as real implementations
E.g., emulated cruise-control may not respond fast enough to keep control of car
o Mapping still time consuming
E.g., mapping complex SOC to 10 FPGAs ,just partitioning into 10 parts could take weeks
o Can be very expensive
o Top-of-the-line FPGA-based emulator: $100,000 to $1mill
o Leads to resource bottleneck, which a company may afford one emulator, then caused a groups to wait.
Reuse: intellectual property cores
37
Designers always has Commercial Of-The-Shelf components COTS, which is
predesigned package ICs, and it is reduced the time of design and debug.
System-On-Chip SOC is implementing all components of a system on single
chip, this is achieved by increasing ICs capacities.
Changing the way COTS components are sold ,it is being sold as intellectual
property (IP) rather than actual IC.
They are sold as behavioral, structural, or physical descriptions rather than
actual ICs.
Designers can integrate these descriptions with other to form one large SOC.
Processor-level components known as cores ,and it is referred to GPP or SPP IP
component.
…Cont
38
Soft core
Gajski’s Y-chart
Synthesizable behavioral
description Structural Behavioral
Typically written in HDL Processors, memories Sequential programs
(VHDL/Verilog) Registers, FUs, MUXs Register transfers
Firm core Gates, flip-flops Logic equations/FSM
Structural description Transistors Transfer functions
Typically provided in HDL Cell Layout
Hard core Modules
Chips
Physical description Boards
Provided in variety of Physical
physical layout file formats
Hard/Soft core advantages & disadvantages
39
Hard cores
Ease of use
Developer already designed and tested hard core
Can use right away
Can expect to work correctly
Predictability
Size, power, performance predicted accurately
It is specific for exact IC process ,and not easily mapped (retargeted) to different process
E.g., core available for vendor X’s 0.25 micrometer CMOS process
Can’t use with vendor X’s 0.18 micrometer process
Can’t use with vendor Y
Soft cores
Can be synthesized to nearly any technology
Can optimize for particular use
E.g., delete unused portion of core which gives Lower power ,and smaller designs
Requires more design effort
May not work in technology not tested for
Not as optimized as hard core for the same processor ,since hard cores have been given more
attention.
Firm core advantages & disadvantages
40
Compromise between hard and soft cores
Some retargetability
Limited optimization
Better predictability/ease of use
New challenges to processor providers
41
Cores have dramatically changed business model of vendors of GPP & SPP.
These changes made for Pricing model & IP protection
Pricing models
In the past
Vendors sold product as IC to the designers
Designers must buy any additional copies, because of impossible copying of ICs
• Could not (economically) copy from original
Today
Vendors can sell as IP instead of ICs itself
Designers incorporate IPs into SOC
Designers can make as many copies as needed, and vendors gain money
Vendor can use different pricing models
Royalty-based model
• Similar to old IC model
• Designer pays for each additional model created
Fixed price model
• One price for IP and designers can make as many copies as needed
Many other models used
IP protection
The next slide
IP protection
42
IP protection has become a key concern of core providers
In the past
Illegally copying of IC is very difficult
Reverse engineering required tremendous, deliberate effort
“Accidental” copying is not possible
Today
Cores sold in electronic format
Deliberate/accidental unauthorized copying are easier
Vendors consider Safeguards greatly when selling their products
Contracts are created between vendors and designers to ensure no copying and distributing
for the IP
Encryption techniques is used by vendors to limit the actual exposure to IP
E.g. watermarking
determines if particular instance of processor was copied
whether copy authorized
New challenges to processor users
43
There are a new challenges posed for a designers to use GPP & SPP
Licensing arrangements
Purchasing a cores is not as easy as purchasing ICs
More contracts enforcing pricing model and IP protection and possibly requiring legal assistance .
Extra design effort
Especially for soft cores
Must still be synthesized and tested
Minor differences in synthesis tools can cause problems
Verification requirements more difficult
Extensive testing for synthesized soft cores and soft/firm cores mapped to particular technology
Ensure correct synthesis
Timing and power vary between implementations
There is no direct access to a core once it has been integrated into a chip
Cores buried within IC
Cannot simply replace bad core like replacing bad IC in the past
Design process model
44
It describes order that design steps are
processed, and each step has many sub Waterfall design
steps. model
1. Behavior description step Behavioral
2. Behavior to structure conversion step
3. Mapping structure to physical
implementation step Structural
Waterfall model Physical
Proceed to next step only after current
step completed
Spiral design model
Spiral model
Proceed through 3 steps in order but Structural Behavioral
with less detail
Repeat 3 steps gradually increasing detail
Keep repeating until desired system
obtained
Becoming extremely popular (hardware
& software development) Physical
Waterfall method
45
If the designer has 6 month to build a system then he proceed with:
1. The designer start with describing behavior of the system completely, may take two months.
2. Once fully satisfied the correct of behavioral ,moving to the structural design, also take two months.
3. Once fully satisfying the correct of structural, then physical implementation is done.
Drawbacks
When we moved to the next step we cant come back to the previous level
Not very realistic
Bugs often found in later steps that must be fixed in earlier step
E.g., when testing the structure we notice that we forgot to handle certain input condition at the behavior
level
Prototype often needed to know complete desired behavior Waterfall design model
E.g., customer adds features after product demo
System specifications commonly change Behavioral
E.g., to remain competitive by reducing power, size, certain features be dropped
Unexpected iterations back through 3 steps cause missed deadlines Structural
Lost revenues
May never make it to market
Physical
Spiral method
46
If the designer has 6 month to build a system then he proceed with:
1. The designer start with describing the basic behavior of the system and it is not complete, may take few weeks.
2. Proceeding to the structural design, also may take few weeks.
3. then creating a physical prototype for the system,and this prototype is used to test out the basic functions.
4. Go back to the first step and continue
First iteration of 3 steps incomplete Spiral design model
Much faster, though Structural Behavioral
End up with prototype
Use to test basic functions
Get idea of functions to add/remove
Original iteration experience helps in following iterations of 3 steps Physical
Drawbacks:
The designer must come up with ways to obtain structure and physical implementations quickly
E.g., the designer uses FPGAs for prototype ,then generating a new silicon for final product takes a long time
May have to use more tools
The designer Could require Extra effort/cost when using extra tools .
Could require more time than waterfall method due to the overhead of creating physical
prototyps.
If correct implementation first time with waterfall
General-purpose processor design models
47
Previous slides focused on SPPs
Can apply equally to GPPs
Waterfall model
Structure developed by particular company
Acquired by embedded system designer
Designer develops software (behavior)
Designer maps application to architecture
Compilation
Manual design
Spiral-like model
Beginning to be applied by embedded system designers
Spiral-like model
48
Designer develops or acquires architecture
Y-chart
Develops application(s) Architecture Application(s)
Maps application to architecture Mapping
Analyzes design metrics
Analysis
Now makes choice
Modify mapping
Modify application(s) to better suit architecture
Modify architecture to better suit application(s)
Not as difficult now
Maturation of synthesis/compilers
IPs can be tuned
Continue refining to lower abstraction level until particular implementation chosen
Summary
49
Design technology seeks to reduce gap between IC
capacity growth and designer productivity growth
Synthesis has changed digital design
Increased IC capacity means sw/hw components
coexist on one chip
Design paradigm shift to core-based design
Simulation essential but hard
Spiral design process is popular
References
50
Embedded System Design: A Unified Hardware/Software Introduction
Frank Vahid and Tony Givargis