Power Optimisation for a 32-bit RISC Processor
Joanne Acland
ST Microelectronics
[email protected]
ABSTRACT
Low power consumption is a vital feature for many kinds of IC chips today. Power Compiler is
an easy-to-use tool that helps designers to achieve a very low power design by using different
automatic power oriented methodologies. After a quick review of Power Compiler features and
advantages, we will speak about the flow tested in our ST Group and the target we had with such
a flow. The next part will speak about the results we get on a full design and the comparison with
our classic hand-made flow. Then we will conclude about the advantages and limitations we
found with this tool on this design, a 32-bit RISC processor.
SNUG Europe 2005 Power Optimisation for a 32-bit RISC processor
1 Introduction
A reduced area and low power consumption are important requirements for smartcard ICs. Due
to these increasingly aggressive requirements low area and low power techniques are to be
adopted throughout the design flow.
The purpose of this paper is to discuss Power Compiler features available to improve power
consumption and to detail the flow tested on a 32-bit microprocessor destined for smartcard
applications. In order to improve our current methodology we decided to introduce power
optimization techniques to the synthesis stage of the flow using Power Compiler in particular the
automatic insertion of gated clocks.
The SmartJ™ microprocessor core is a 32-bit Reduced Instruction Set Computer (RISC)
designed to execute both native RISC instructions and JavaCard™ Technology instructions
(bytecodes) directly. The SmartJ™ microprocessor has been specifically upgraded in order to
integrate state-of-the-art security mechanisms required by smartcard applications.
2 Power Compiler clock gating
One of Power Compiler features is the automatic instancing of clock gating circuitry
based on the RTL functionality analysis. This chapter introduces the automatic clock gating
principles, its benefits and the Power Compiler design flow.
2.1 Principle
Automatic clock gating insertion is invoked with the insert_clock_gating command on a GTECH
netlist. When this command is used, a single clock-gating cell replaces the multiplexers and
feedback loops of multi-bit registers with a synchronous load-enable.
The replacement principle is illustrated in the schematics below:
Figure 1: Power Compiler principle
Register D_OUT
D_IN
Bank
EN D_IN Register D_OUT
CLK EN G_CLK Bank
Latch
CLK
2.2 Benefits
The major expected advantage of the method is the reduction of power consumption since multi-
bit registers that have been automatically gated by the tool will only receive a clock when they
really need to change their contents.
However, this is not the only possible advantage, as replacing 32 multiplexers by a single clock
gating cell, even a large one, tends to reduce the design area. This is due to the multiplexer cells
area that is larger than the clock gating cell area, and the reduction of routing congestion.
SNUG Europe 2005 Power Optimisation for a 32-bit RISC processor
The multiplexer replacement has also a timing impact: it can remove one level of logic on the
datapath, which may be considered as an improvement.
Finally, the ease-of-use and setup of Power Compiler is one of its big advantages. It is very
simple to invoke from Design Compiler or Physical Compiler, and it saves days of RTL coding
compared with manual insertion of the clock gating cells, and it also helps having a library-
independent design.
SNUG Europe 2005 Power Optimisation for a 32-bit RISC processor
2.3 Other available clock gating features in Power Compiler:
2.3.1 Enhanced Clock Gating (works through hierarchy)
Reg Bank
Reg Bank
(Width 2)
a (Width 2)
a
EN Reg Bank Reg Bank
(Width 2) E (Width 2)
b b
CL GCL
CL
Min-width violation when the minimum
bitwidth is default: 3
Common EN factoring allows additional clock
Figure 2 : Enhanced Clock Gating
2.3.2 Multistage Clock Gating
EN=AB Register A B Register
CLK Bank CLK Bank
ICG ICG ICG
EN=AC Register Moved closer to C Register
clock root
Bank Bank
ICG ICG
Regular Clock-gating Multi-stage Clock-gating—
Factored EN for more power savings
Figure 3 : Multistage Clock Gating
2.3.3 Module level Clock Gating
Gated CLK Module 1
CG Register
EN
RAM
Register
CLK
Module 2
Figure 4: Module level clock gating: Power Compiler automatically replaces the
handcrafted Clock Gating cells with the desired clock gating style
SNUG Europe 2005 Power Optimisation for a 32-bit RISC processor
2.3.4 Power Compiler with Physical Compiler
When Power Compiler is used within Physical Compiler, clock-gating cells placement is
performed with respect to the gated registers placement. The gating cells will be placed as close
as possible to the flip-flops they drive with a soft bound constraint.
If the clock gating cells are not fully integrated and are thus made of several discrete elements,
Physical Compiler will place those elements very close with a hard bound constraint in order to
achieve a gated clock without glitches.
Moreover, Physical Compiler provides user-controlled capabilities to perform rewiring and
removing of Power Compiler clock gating in order to achieve timing goals. Power Compiler
within Physical Compiler helps designers to manage skew and timing effectively.
3 What do we want to achieve?
The aim of inserting gated clocks was to improve power consumption and to reduce area by
removing the multiplexer from the data path. By using Power Compiler there was an
additional advantage. Previously gated clocks had been inserted into our design by hand in
the RTL. These gated clocks had to be changed each time a functionality was changed and
therefore increased the complexity and risk associated with modifications. By using Power
Compiler to insert the gated clocks we reduced the complexity of our RTL.
Furthermore these gated clocks inserted by hand were obviously not recognized
automatically by DC, Primetime or the back-end flow tools. By using Power Compiler,
gated clocks can be identified and test aspect consideration is taken into account
automatically.
Another complication concerning gated clocks inserted by hand arose because in order to
avoid glitches the enable was generated before the falling edge of the clock, and therefore
became critical when reducing the clock period.
The number of gated clocks inserted by hand was limited. Approximately 48 percent of the
registers were gated by hand. The aim was therefore for Power Compiler to insert at least as
many gated clocks and ideally more. Not only would this further reduce the power
consumption but also the area.
4 Results
These results were obtained by using the classic and the enhanced clock gating (with no
hierarchy crossing) methodologies only.
4.1 Number of clock gating elements and gated registers
SNUG Europe 2005 Power Optimisation for a 32-bit RISC processor
Version DC Number of Clock Number of Gated Number of Ungated registers
gating elements registers
Hand made 46 957 (48.83%) 1003 (51.17%)
2003.12 53 1382 (70.51%) 578 (29.49%)
2004.12 58 1498 (76.43%) 462 (23.57%)
Remark: module level, multi-stage and enhanced hierarchical CG options were not used.
The difference between 2004.12 and 2003.12 comes from the fact, that in the last version, Power
Compiler is also able to insert clock gating on external feedback configurations and on always
enabled registers.
As we can see in the array above, with Power Compiler, many more register banks are gated
compared with the hand made clock gating.
4.2 Timing
Clock gating option Slack
No clock gating 0
Original manual clock gating 0
Power Compiler automatic clock gating 0
4.3 Area
Clock gating option Area
No clock gating 898650
Original manual clock gating 894255
Power Compiler automatic clock gating 824176
The gain in area obtained through automatic gated clock insertion was approximately 8%
compared with the RTL gated clock insertion.
5 Results
During the investigation several problems were encountered with Design Compiler linked to the
detection of gated clocks. Many of these have been corrected, however, one or two problems still
remain. The following RTL examples show registers that do not have gated clocks inserted
automatically. Synopsys will investigate to support these cases in a next release.
Example 1:
always @(posedge clk or negedge nRESET)
if (!nRESET) signal1 <= `paramReset;
else signal1 <= signal1Next;
always @(A or B or C or D or E or F or G or
H or I or signal1) begin
SNUG Europe 2005 Power Optimisation for a 32-bit RISC processor
if (A) signal1Next = B[23:2];
else if (!C && D) signal1Next = E[23:2];
else if (F) signal1Next = G;
else if (H) signal1Next = I;
else signal1Next = signal1;
end
Example 2:
always @(posedge clk or negedge nRESET)
if (!nRESET) signal2 <= 24'h000000;
else signal2 <= signal2Next;
always @(A or B or signal2 or C or D or E or F or G) begin
if (A)
if (B) signal2Next = {signal2[23:16], C[15:0]};
else signal2Next = C;
else if (D) signal2Next = {E[23:0]};
else if (F) signal2Next = {signal2[23:21], G};
else signal2Next = signal2;
end
Example 3:
always @(posedge clk or negedge nRESET)
if (!nRESET) signal3 <= 24'h000000;
else signal3 <= signal3Next;
end
always @(A or B or C or signal3) begin
if (!C) signal3Next = A + B;
else signal3Next = signal3;
end
6 Conclusions
In conclusion it is possible to say that Power Compiler has achieved the required goals for the
automatic gated clock insertion:
- The final design had more gated clocks inserted automatically than those we had
previously added by hand.
- In addition, this final netlist was easier to route.
SNUG Europe 2005 Power Optimisation for a 32-bit RISC processor