LOW POWER VLSI DESIGN
Assignment-1
G Abhishek Kumar Reddy, M Manoj Varma
Introduction
INTEL CORP SENIOR VICE PRESIDENT
Patrick P. Gelsinger
Problems of Power dissipation
Higher power dissipation leads to
Reduces time of operation of the device
Higher weight of the batteries involved in the devices
Reduces mobility of the device due to heavy battery
High efforts of cooling
Increasing operational costs
Reduced reliability of the device.
This figure perfectly explains the impact of higher power consumption of the device battery.
Therefore the need for devices which consume power efficiently is needed. This leads us to the
discussion of Lower VLSI Design.
Sources of power dissipation
The major sources of power dissipation is shown below:
Switching power is caused due to charging and discharging of the capacitors driven by the
circuit
Short circuit power is caused by short circuit currents that arise when pairs of
NMOS/PMOS are conducting simultaneously.
Leakage power originates from substrate injection and subthreshold effects. In older
technologies switching power was predominant but as the technology is gradually
decreasing leakage power becomes predominant.
DESIGN FOR LOW POWER IMPLIES THE ABILITY TO REDUCE ALL THE THREE
COMPONENTS
Principles of Low Power Design
Reduction in Switching voltage
Reducing switching voltage dramatically reduces the dynamic power and hence achieves
savings. But there is a tradeoff to this approach as the performance of MOS transistor is
lost. Voltage scaling also slows the technology.
Reducing capacitance
Simply reducing the capacitance does not guarantee the the desired saving. The goal, here
is to reduce the product of capacitance and switching voltage. Signals with high switching
frequency should be routed with minimum parasitic capacitance and vice versa.
Reducing switching frequency
As stated before the product of capacitance and frequency of the device operation should
be reduced to achieve the goal of power reduction.
Dynamic power reduction techniques
As the scaling of supply voltage has its own trade offs, in order to avoid those the following
methods have been employed:
Parallel architecture (Parallelism)
Pipelined architecture (Piplining)
Parallelism
Piplining
Trends observed:
Multiple Voltage techniques
Dual Vdd techniques are sometimes used in order to maintain the performance levels wherein
Critical paths are assigned to high Vdds
Non critical paths are assigned to low Vdds
To make use of this multi Vdd concept of levels converts is used to avoid the static current that
may flow low Vdd gates directly drive high Vdd gates which is never being completely turned off.
While there are advantages of level converters to reduce static current, there are quite a number of
problems which limits its use:
Placement problem- Vddl and Vddh cells should be separated because they have diferent
n well voltages
Usage of more silicon area
Increase Delay
To tackle the problem of level converters Clustered Voltage Scaling(CVS) was introduced which
is shown below:
By making use of CVS algorithm we can separate the high Vdd and low Vdd cells into two clusters
thereby making use of the above structure. This algorithms actually makes use of Depth First
Search(DFS) technique.
There is another adavanced method to DFS known as partial DFS which greatly results in reduction
in computation time.
Dynamic Voltage Scaling
Most of the systems at some point of time for doing their task do not require high voltage and
frequency to achieve it. In such cases system can operate at a comparitively lower voltage and
frequency to finish the task within the deadline with considerable savings in power. This is shown
in the diagram below.
AN EXAMPLE- NEXUS S PROCESSOR SPECS
Running Android 4.1.2 Jelly Bean OS.
ARM Cortex A8 Hummingbird Processor
Supports dynamic frequency scaling from 100Mhz to 1Ghz
Supports voltage scaling from 800mV to 1500mV
Supported frequencies along with their predefined voltage of operation are as follows
100Mhz -> 950mV
200Mhz -> 950mV
400Mhz -> 1050mV
800Mhz -> 1200mV
1000Mhz -> 1250mV
Adaptively voltage scaling
AVS uses a novel closed loop approach to voltage scaling. The AVS loop is closed around
the performance of the processor, eliminating any excess voltage margin. This is in contrast to
table-based DVS systems, which must include extra voltage margin. The AVS loop also provides
the ultimate kelvin sense connection for the voltage regulator feedback by actually sensing the
voltage in the load. This eliminates voltage error due the ground difference between the power
supply and the processor. The power supply tolerance is also trimmed out automatically by the
AVS loop, further reducing the margin voltage. All these reductions in the supply voltage yield a
dramatic energy savings in the processor because the dynamic energy usage is proportional to the
voltage squared.
General power saving flow
Switching activity reduction techniques
Logic restructuring
Chain structure has a overall lower switching activity than a tree implementation for
random inputs.
This comes at the expense of delay which is higher in chain implementation than in tree
implementation.
Sometimes glitches may be present.
Input ordering
Change in order of inputs may reduce the switching activity.
It is always beneficial to postpone signals with higher probability to a later instant.
Alternate encoding schemes
Instead of always using the binary encoding scheme technique, it is sometimes beneficial
to use other encoding schemes like gray code. The goal should be reducing the number of
bit switches between adjacent states.
Bus power
Buses are a significant source of power dissipation. Power dissipation is mainly caused by
High switching activity
Large capacitive loading
For an n-bit bus: Pbus = n* fClkCloadVDD2
Bus power reduction techniques
Alternative bus structures
Segmented buses (lower Cload)
Charge recovery buses
Bus multiplexing (lower fClk possible)
Minimizing bus traffic (n)
Code compression
Instruction loop buffers
Minimization of bit switching activity (fclk) by data encoding
Minimize voltage swing (VDD2) using differential signaling
Segmented buses
Bus segmentation helps reduce the switched capacitance. When a single shared bus is connected
to all modules, this results in a large bus capacitance due to
Large number of drivers and receivers sharing the same bus
Parasitic capacitance of long bus line
By using a segmented bus overall routing area may increase but it reduces the switched capacitance
during each bus access.
Bus segmentation architecture
Bus multiplexing
Involves sharing of long data buses with time multiplexing
During even cycles source S1 sends data and during odd cycles source S2 sends data.
For shared bus advantages of data correlation are lost.
Bus sharing should not be used for positively correlated data streams
Bus sharing may prove advantageous in a negatively correlated data stream (where
successive samples switch sign bits) - more random switching.
Low Swing bus
Needs a strong reference voltage and makes use of differential signaling scheme.
Power efficiency of this scheme depends on the extent of voltage swing reduction. A swing
of 0.1Vdd may lead to 99% saving in dynamic power.
Limitaions
Producing large on chip capacitance is difficult and may have to be installed off-chip.
Ratio of Cbus/Cin which determines voltage swing reduction may be difficult to control
due to sensitivity to process variations.
Clock gating
Part of the combinational circuit which is not under use can be gated so that the computation of
that part of circuit is not performed thereby saving a considerable amount of power.
Precomputation
An effective way of reducing power loss where the most significant bits of the data are computed
instead of computing the entire data stream thereby reducing the switching activity between the
adjacent bits. The out put is mainly based on the precomputation done.
Short circuit power
Inputs have finite rise and fall times mainly depending on the device sizes
Direct current path from VDDto GND while PMOS and NMOS are ON simultaneously
for a short period.
Interconnect delay
In deep submicron (DSM) technologies, interconnects no longer behave as resistors but may
have associated parasitics such as capacitance and inductance. With a linear increase in
interconnect length, both the interconnect capacitance (C) and interconnect resistance (R) increase
linearly, making the RC delay increase quadratically
The total RC delay of an interconnect line can be reduced drastically with the insertion of a
signal amplifier known as a repeater. buffer insertion is becoming a bulky technique for DSM
technologies, requiring to find the solution with different approach
Signals on an interconnect get highly distorted due to propagation delay and coupling effects
of adjacent lines.Hence along with power and delay, noise cancellation is also an important point
to be noted while developing the algorithm/technique for better transmission.
Reduction of delay and power consumption is the main motivation behind using repeater/buffer
insertion technique. In this technique a large interconnect is broken into smaller pieces and joined
with CMOS buffers. For example, assume a long interconnect has 5 units of resistance and 10
units of capacitance. The total RC delay would be 50 units. However, if five repeaters are inserted
within this line to break the interconnect into five equal pieces, the RC delay would be 1 x 2 + 1 x
2+ 1 x 2+ 1 x 2 + 1 x 2 = 10 units. If the delay of the five repeaters is less than 40 units, then there
is a speed benefit to inserting CMOS repeaters. Hence the solution for this problem has been
approached in the same manner.
Buffer Insertion is a very effective approach for delay reduction. But as is clear that, in every new
generation deep submicron technology, buffer insertion is becoming a major problem,because of
their number and also because they now a major source of power dissipation. Hencea trade-off is
required between delay and power consumed. Thus there is a need for a new approach that while
reducing the delay, also consumes less power.
Schmitt trigger is a special logic element adjusted to work with analog input signals. The
primary purpose of Schmitt trigger is to restore the shape of digital signals Hence this element
can replace buffer as far as restoring the signal is concerned. Because of transmission line effects
digital shape transforms from square to trapezoid or triangle or more complex signal
The benefit of a Schmitt trigger over a circuit with only a single input threshold (such as buffer)
is its greater stability (noise immunity). With only one input threshold, a noisy input signal near
that threshold could cause the output to switch rapidly back and forth from noise alone. A noisy
Schmitt Trigger input signal near one threshold can cause only one switch in output value, after
which it would have to move beyond the other threshold in order to cause another switch.Schmitt
trigger can be easily be implemented with 6 C-mos Transistors
Components of dynamic power dissipation due to different capacitance sources: gate
capacitance, diffusion capacitance, and interconnect capacitance.
Interconnect delay
Tp min = TC Q + Tint + Tlogicmax + Tsetup
where TC Q is the time required for the data to leave the initial register after the clock signal
arrives, Tint is the interconnect delay, Tlogicmax is the maximum logic gate delay, and Tsetup is
the required setup time of the receiving register. From (interconnect logical), by reducing Tint,
the clock period can be decreased, increasing the overall clock frequency of the circuit (assuming
the data path is a critical path).
Noise
With interconnect scaling, coupling capacitance between (and among) interconnects dominates the
ground capacitance. Furthermore, inductive coupling has to be considered due to increasing signal
frequencies, making coupling noise more significant (and complicated). Interconnect coupling
induced noise can be classified into two categories: voltage level noise and delay uncertainty, as
shown in Fig.3 interconnect-coupling. Noise may cause a malfunction in the circuit if the noise
level is greater than a certain threshold, thereby reducing yield. In addition to coupling effects,
delay uncertainty can also be caused by other factors, such as process variations (on both
interconnects and the inserted repeaters or pipeline registers), temperature variations, and
power/ground noise. Delay uncertainty is both spatially dependent (due to process variations) and
temporally dependent (due to coupling, temperature variations, and power/ground noise). Timing
margins are assigned to manage this delay uncertainty,thereby increasing the clock period and
reducing the overall performance of the circuits. When delay uncertainty exceeds these margins,
setup or hold violations may occur, reducing the yield.