Clock Power Optimization
Clock Power Optimization
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
to avoid significant timing issues. An MBFF merges multiple Section II provides background and preliminaries for this
single-bit flip-flops into a single flip-flop [16]–[21]. In this article. Sections III and IV explain the proposed method
case, power consumption can be considerably reduced because in detail. Section V presents the experimental results and
the individual single-bit flip-flops are merged into one such analysis, and Section VI concludes this article.
unit. After placement, single bit flip-flops are found which can
be merged together and they are replaced by MBFFs. The other II. BACKGROUND AND P RELIMINARIES
approach for clock sink optimization is to use register banking. Clock network plays a major role in power consumption in
While MBFF requires the development of a new MBFF cell IC design. Clock networks account for most of the dynamic
library, register banking identifies multiple flip-flops and forms power consumption because a clock signal generally has a
a bank in a similar manner to register files [22], [23]. To intro- higher toggling ratio than other data signals. Therefore, it is
duce MBFF or register banking, the logic signals connected critical to reduce the clock power by reducing the dynamic
to the flip-flops need to be rerouted. This introduces timing power of the clock network. In [24] and [25], dynamic power
violations or routing issues. To resolve these problems, timing- can be modeled as
engineering change order (ECO) or debanking is required and Pdyn = α · Ctot · Vdd
2
· f (1)
the overhead is not negligible.
This article introduces a new clock network optimiza- where α, Ctot , Vdd , and f are the switching activity, total
tion method which lines up flip-flops to reduce clock net- capacitance, supply voltage, and operating frequency, respec-
work power. Unlike conventional methods which involve the tively. The switching activity is determined by a function of
restructuring of flip-flops like MBFF or register banking, the design. Controlling f and Vdd would be good ways to
the proposed method performs an integrated clock gating reduce a dynamic power [26]. However, under the same clock
(ICG) cell-based in-line flip-flop relocation to reduce both specification and clocking method, the capacitance should be
the wire capacitance and wire length of a clock network decreased to reduce the dynamic power. In a clock network,
without any changes in the clock structure. The number of the capacitance is mainly composed of a pin capacitance of
vias is also reduced because the in-lined flip-flops allow for the flip-flops and a wire capacitance between flip-flops. Flip-
straight interconnections. The flip-flop relocation is performed flop sizing or merging could reduce the pin capacitance. On the
over a short distance within a timing error-free region in other hand, the wire capacitance can be decreased by reducing
the same clock network. This helps avoid additional timing the wire length with a decreased number of vias. Wire length
violations or other constraint violations such as maximum optimization of the leaf-level clock network is effective in
capacitance and maximum transition. Hence, the proposed reducing the wire capacitance because 80% of the total wire
method can be effectively applied for timing-critical or turn- capacitance of a clock network is accounted for by the wires
around time (TAT) critical designs. between the flip-flops and their driving cells [27], [28]. This
The proposed flip-flop alignment method is a simple, yet, wire optimization includes reducing the length of the leaf-wire
very effective method for clock power optimization. The and the number of vias.
significant features of this method are summarized as follows. This section explains the preliminaries—ICG cell, MBFF,
and register banking, which are commonly used for clock
1) The proposed method reduces wire capacitance, wire power reduction.
length, and via counts in a clock network by relocating
1) ICG: ICG cell is an integrated clock gating cell
flip-flops in lines. This helps reduce the cell area of clock
which can enable or disable a clock signal [29], [30].
components such as clock buffers and ICGs because of
One or more flip-flops are connected to a single ICG
reduced load capacitance.
cell and their outputs are controlled by the clock enable
2) The proposed method does not cause timing degrada-
signal of the ICG cell. By removing unnecessary clock
tion and other constraint violations such as maximum
toggles, the dynamic power of the clock network can be
capacitance violation and maximum transition viola-
reduced. Most of the recent IC designs use ICG cells
tion because flip-flops are relocated within a timing
for dynamic power reduction.
violation-free region. Since the method relocates flip-
2) MBFF: More than two single-bit flip-flops are merged
flops within a short distance, it does not introduce
and form a single MBFF. Fig. 1(a) shows a 2-bit
additional routing congestions.
MBFF formed by merging two single-bit flip-flops.
3) The proposed method does not have additional run-
Because the MBFF removes an inverter pair in a single-
time and TAT overhead. Since the method uses pre-
bit flip-flop, the cell area and pin capacitance can be
calculated timing information without additional timing
reduced [17]–[19]. As more single-bit flip-flops are
update, the run-time overhead is negligible. Moreover,
merged and changed as a large size MBFF, more conges-
the method does not require additional cell libraries such
tion in placement or routing is expected and this would
as a special library for MBFF.
require effective measures to resolve congestion issues.
Unlike the previous clock network optimization methods It also requires an MBFF cell library development.
such as flip-flop merging or flip-flop clumping techniques, 3) Register Banking: Register banking groups multiple flip-
the proposed algorithm aligns flip-flops to physically simplify flops and forms a bank such as 2 × 2, 4 × 4, 4 × 8, and
the clock network wiring. This article describes the details so on. Fig. 1(b) illustrates a 2 × 4 register bank. This
of the alignment methodology and evaluates the proposed structure can considerably reduce the capacitance and
method. The rest of this article is organized as follows. length of a wire in a leaf-level clock network by placing
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KWON et al.: VIRTUAL TILE-BASED FLIP-FLOP ALIGNMENT METHODOLOGY FOR CLOCK NETWORK POWER OPTIMIZATION 3
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KWON et al.: VIRTUAL TILE-BASED FLIP-FLOP ALIGNMENT METHODOLOGY FOR CLOCK NETWORK POWER OPTIMIZATION 5
Fig. 6. Examples of virtual tile creation around (a) hard macro and
(b) common placement area.
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KWON et al.: VIRTUAL TILE-BASED FLIP-FLOP ALIGNMENT METHODOLOGY FOR CLOCK NETWORK POWER OPTIMIZATION 7
Algorithm 2 Algorithm for Effective Column Selection virtual tiles are removed for placement legalization followed.
During flip-flop relocations, the attributes for location fixing
of ICG cells are set to 1 in a P&R tool. Therefore, there is no
change in ICG cell locations.
Once flip-flops change their locations, there is a possibility
that the relocated flip-flops would overlap with exiting combi-
national cells which are originally placed on the corresponding
virtual tiles. To resolve the cell overlap concern, the proposed
method performs a two-step placement legalization. First,
it fixes the placement of all flip-flops and then the overlapping
combinational cells are relocated to empty locations where
the flip-flops were originally placed. Next, it legalizes all
cells together including flip-flops to remove the remaining
cell overlaps such as an overlap between flip-flops and the
prefixed cells. Because the flip-flop is moved with the small-
est distance (dmin ), an overlapping combinational cell would
be relocated within a small distance. Thus, this would not
introduce much change in design constrains including timing.
In addition, because it is not a final stage of a physical
implementation flow, few newly generated DRC violations can
be fixed during remaining design steps such as clock network
synthesis or routing.
V. E XPERIMENTAL R ESULTS
This section evaluates the proposed clock network optimiza-
tion. Five industrial IPs (IP1∼IP5) in mobile SoCs are evalu-
ated which are implemented with the state-of-the-art process
technology, 14 and 10 nm. Both CPU and GPU IPs which have
extremely high operating frequency and cell density are also
considered (IP1 and IP3). The gate-level netlists are generated
by Synopsys Design Compiler, and the proposed method is
integrated with Synopsys IC Compiler II. For routing layers,
six metal layers and seven metal layers are used for 14
and 10-nm design, respectively. The numbers of layers for
signal routing are 10 and 11 for each process technology.
Each IP is implemented with multicorner multimode (MCMM)
finds an alignment column (Ceff ) with the most virtual tiles and scenarios. The numbers of scenarios considered are 8, 6, 10,
Ceff determines how many flip-flops can be aligned on Ceff . and 8 for IP1, IP2, IP3, and IP4, respectively. Because all
The flip-flops yielding the smallest dtotal are aligned on Ceff . IPs are evaluated under complete physical implementation
The other flip-flops which are not aligned on Ceff are relocated flows including physical verifications such as signoff DRC
on either side of Ceff to a column with smaller dtotal and the check or electromigration analysis, the experimental results
corresponding column becomes a secondary alignment column are provided with a high confidence.
in R A . Algorithm 2 explains the procedure for finding the
most effective column in R A . Once Ceff is selected, flip-flops A. Flip-Flop Alignment and Routing Result
determines their optimal locations considering the minimum
To illustrate the flip-flop alignment results by the proposed
moving distances.
method, two ICG cells and flip-flops belonging to the ICG are
chosen from IP1. In Fig. 9, the cell colored in red is the ICG
D. FF Alignment and Placement Legalization cell and the flip-flops connected to the ICG cell (referred as
The flip-flop alignment is performed at the ICG granular- fan-out flip-flops) are highlighted in green. The figures are cap-
ity. After finishing each alignment, the algorithm generates tured after finishing all physical implementation steps includ-
commands for flip-flop relocations with a corresponding P&R ing signal routing and post-routing optimization. The clock net
tool command and updates a script for flip-flop relocations. is highlighted in yellow. The left figures in Fig. 9(a) and (b) are
The used tiles (placement blockages) for current alignment are the results obtained from conventional design flows without
removed to avoid their usage for other alignments. Once every the proposed method. The results from the proposed alignment
alignment is finished, the final script which includes relocation method are presented in the right figures in Fig. 9(a) and (b).
commands for flip-flops is applied and flip-flops change their As can be seen, all flip-flops are aligned in three alignment
actual physical locations at a time. Then all the remaining columns. Flip-flop normally has a horizontally long pin shape
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I
C LOCK N ETWORK I MPROVEMENT ON L EAF L EVEL FOR ICG1 AND ICG2
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KWON et al.: VIRTUAL TILE-BASED FLIP-FLOP ALIGNMENT METHODOLOGY FOR CLOCK NETWORK POWER OPTIMIZATION 9
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE II
C LOCK N ETWORK I MPROVEMENT FOR F OUR D IFFERENT IP S
and pin capacitance values, respectively. The smallest wire the proposed method is 22.66%, while MBFF method has
capacitance value is found from the proposed method because −34.16% (showing an increased wire capacitance). The reason
the proposed alignment algorithm aligns the flip-flops. Fig. 13 as to why the MBFF method results in an increased wire
illustrates the wire capacitance changes (Cwire ) in the clock capacitance than that of the original method is that there are
leaf nets for all ICG cells in IP1. The negative value means unexpected long clock routing occurrences because IP1 is a
that the wire capacitance has been decreased after alignment. CPU design which has extremely high operating frequency
Fig. 13 shows that although there is an approximate 10% of 2100 MHz and the cell utilization of over 90%. With tight
of increase in Cwire , most of the increase is small and less design constraints (timing, utilization, and so on), it is difficult
than 1 pF. For IP1, the total wire capacitance reduction by for MBFF methods to achieve a high MBFF merging ratio as
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KWON et al.: VIRTUAL TILE-BASED FLIP-FLOP ALIGNMENT METHODOLOGY FOR CLOCK NETWORK POWER OPTIMIZATION 11
TABLE III
OVERALL D ESIGN I MPROVEMENT FOR F OUR D IFFERENT IP S
reported in Table II. The difficulty of MBFF placement by its 14.1% and 3.5% compared to the original method and the
large cell size limits the wire capacitance reduction. Other MBFF method, respectively, while MBFF has 10.9% of clock
IPs also show a significant reduction in wire capacitances power reduction compared to the original method. The average
of 14.09%, 16.56%, 20.03%, and 23.20% for IP2, IP3, IP4, total power reduction is 9.0% with the maximum of 15.8%.
and IP5, respectively, by the proposed method. Similar pin The operating frequency for high-performance IPs like CPU is
capacitance values are found because the original flow and limited due to higher power consumption, and the clock power
the proposed method do not change the number of flip-flops. has the largest portion of a total power. Considering this fact,
However, MBFF merges multiple single-bit flip-flops as a it is considered that 14.1% of a clock power reduction is a
single flip-flop and this helps reduce the pin capacitance. The significant achievement in real industrial designs. In addition,
number of vias on the clock network shown in the sixth the total power reduction by the proposed method would be
column is also reduced since the proposed method relocates higher when the operating frequency increases.
the flip-flops in a possibly single line. It should be noted that
the area of the clock cells is reduced than that of the original D. Overall Design Analysis
and MBFF cases. This is because the clock network can use Timing analysis for the MBFF and proposed methods is
smaller cells than that of the conventional methods due to shown in Table III. Both worst-negative-slack (WNS) and
the reduced resistance and capacitance of the clock nets. The total-negative-slack (TNS) for setup timing are improved com-
eighth column in Table II shows the merging ratios for the pared to the conventional methods after applying the proposed
MBFF method and alignment ratios for the proposed method. method. It is because the proposed method helps flip-flops
Owing to the tight timing constraints of the industrial IPs, have shorter clock latencies than the original case due to the
the average MBFF merging ratio is 49.46%. On the other hand, reduction in wire length and wire capacitance. This allows
the average alignment ratio of the proposed method is 95.68% a clock latency difference between the launch flip-flop and
due to the simple and efficient relocation algorithm. the capture flip-flop to become smaller because the shortened
The last three columns in Table II show the internal, clock latencies are less affected by the clock uncertainty mar-
switching, and total power for each clock network. The leakage gin and on-chip-variation (OCV) margin. The shortened clock
power is not shown in the table since it is too small (under latencies also help use a useful skew technique which varies
0.1% of total clock power). For all IPs, the proposed method clock latencies to fix the setup timing violations, aggressively.
reduces clock power significantly due to the reduction of wire For this reason, the proposed method is more effective to
capacitance and wire length. The reduction ratios compared optimize timing and power than the original method. There
to their original designs are 15.18%, 10.48%, 17.11%, 7.67%, is no additional overhead on hold violation by the proposed
and 20.13% for IP1, IP2, IP3, IP4, and IP5, respectively. method because the proposed flip-flop alignment is applied
They are much larger than that of the MBFF method. Fig. 14 before CTS.
depicts an average reduction ratio of clock power by the Total cell area is also reported in Table III with a low-
MBFF and the proposed method with a baseline of original voltage-threshold (LVT) cell area and a regular-voltage-
design. The reduction ratios by the proposed method are threshold (RVT) cell area to check the area overhead. In most
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
cases (IP1, IP2, IP3, and IP5), the LVT area is decreased after [7] A. Rajaram and D. Z. Pan, “MeshWorks: A comprehensive framework
applying the proposed method. It is because some LVT cells for optimized clock mesh network synthesis,” IEEE Trans. Comput.-
Aided Design Integr. Circuits Syst., vol. 29, no. 12, pp. 1945–1958,
can be replaced with RVT cells during power optimization due Dec. 2010.
to the clock network quality improvement by the proposed [8] G. Wilke and R. Reis, “A new clock mesh buffer sizing methodology for
method. The reason as to both LVT cell area and RVT cell skew and power reduction,” in Proc. IEEE Comput. Soc. Annu. Symp.
VLSI, Apr. 2008, pp. 227–232.
area is slightly increased after applying the proposed method [9] M. R. Guthaus, G. Wilke, and R. Reis, “Non-uniform clock mesh
in IP4 is that the proposed method fixed more setup timing optimization with linear programming buffer insertion,” in Proc. Design
violations. In Table III, both WNS and TNS for setup timing Autom. Conf., Jun. 2010, pp. 74–79.
[10] W. Liu, G. Chen, Y. Wang, and H. Yang, “Modeling and optimization
of the proposed method are much smaller than that of the of low power resonant clock mesh,” in Proc. 20th Asia South Pacific
original case. In this case, both LVT cell area and RVT cell Design Autom. Conf., Jan. 2015, pp. 478–483.
area could increase. This results in a less leakage power [11] H. Chou, H. Yu, and S. Chang, “Useful-skew clock optimization for
multi-power mode designs,” in Proc. IEEE/ACM Int. Conf. Comput.-
consumption compared to the original and MBFF methods. Aided Design (ICCAD), Nov. 2011, pp. 647–650.
The reason for the slight increase in the RVT cell area in [12] S. Roy, P. M. Mattheakis, L. Masse-Navette, and D. Z. Pan, “Clock tree
IP1 and IP4 is that some LVT cells are changed to RVT cells resynthesis for multi-corner multi-mode timing closure,” IEEE Trans.
for power optimization. The last column gives the number Comput.-Aided Design Integr. Circuits Syst., vol. 34, no. 4, pp. 589–
602, Apr. 2015.
of DRC violations. Even though there is a small increase [13] R. Shandilya and R. K. Sharma, “Low power positive-edge triggered
in DRC violations in IP1, it would not be critical because d-type flip-flop,” in Proc. Int. Conf. Trends Electron. Informat. (ICEI),
the number of remaining DRC violations is negligibly small. May 2017, pp. 1018–1023.
[14] P. Zhao, T. K. Darwish, and M. A. Bayoumi, “High-performance and
IP3 and IP4 show much smaller DRC violations after applying low-power conditional discharge flip-flop,” IEEE Trans. Very Large
the proposed method. Overall, there is no degradation in Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 477–484, May 2004.
timing or physical constraints by the proposed algorithm while [15] A. G. M. Strollo, E. Napoli, and D. D. Caro, “New clock-gating
techniques for low-power flip-flops,” in Proc. Int. Symp. Low Power
there is a significant clock network improvement. Electron. Design (ISLPED), Jul. 2000, pp. 114–119.
[16] T. Lee, D. Z. Pan, and J.-S. Yang, “Clock network optimization with
VI. C ONCLUSION multibit flip-flop generation considering multicorner multimode timing
constraint,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,
In this article, we proposed a novel clock network opti- vol. 37, no. 1, pp. 245–256, Jan. 2018.
mization method to minimize the dynamic power of the clock [17] Y. Shyu, J. Lin, C. Huang, C. Lin, Y. Lin, and S. Chang, “Effective and
network. It creates virtual tiles and finds out the most effective efficient approach for power reduction by using multi-bit flip-flops,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 4,
columns for alignments. Then, flip-flops are relocated at the pp. 624–635, Apr. 2013.
virtual tiles considering the minimum moving distances. This [18] Z. Chen and J. Yan, “Utilization of multi-bit flip-flops for clock power
allows a clock network synthesis to create a clock network reduction,” in Proc. 19th IEEE Int. Conf. Electron., Circuits, Syst.
(ICECS), Dec. 2012, pp. 677–680.
that is much simpler and effective by using straight nets and [19] M. P.-H. Lin, C.-C. Hsu, and Y.-T. Chang, “Post-placement power opti-
fewer vias. The wire capacitance and wire length are signif- mization with multi-bit flip-flops,” IEEE Trans. Comput.-Aided Design
icantly reduced without any degradation in timing or other Integr. Circuits Syst., vol. 30, no. 12, pp. 1870–1882, Dec. 2011.
[20] S.-H. Wang, Y.-Y. Liang, T.-Y. Kuo, and W.-K. Mak, “Power-driven
physical design constraints. Since flip-flops are relocated flip-flop merging and relocation,” IEEE Trans. Comput.-Aided Design
within very short distances and they are not merged to Integr. Circuits Syst., vol. 31, no. 2, pp. 180–191, Feb. 2012.
MBFF or resister bank, the proposed method is more effective [21] C. Hsu, Y. Chen, and M. P. Lin, “In-placement clock-tree aware multi-
bit flip-flop generation for power optimization,” in Proc. IEEE/ACM Int.
than MBFF or register banking method in post optimizations Conf. Comput.-Aided Design (ICCAD), Nov. 2013, pp. 592–598.
after clock network synthesis. Especially, for the designs with [22] W. Shen, Y. Cai, X. Hong, and J. Hu, “Activity-aware registers placement
a high operating frequency or a high cell density, the proposed for low power gated clock tree construction,” in Proc. IEEE Comput.
Soc. Annu. Symp. VLSI (ISVLSI), Mar. 2007, pp. 383–388.
method can be applied effectively. Finally, the MBFF opti- [23] W. Hou, D. Liu, and P. Ho, “Automatic register banking for low-
mization method can be integrated with the proposed align- power clock trees,” in Proc. 10th Int. Symp. Quality Electron. Design,
ment method for further improvement of clock network power. Mar. 2009, pp. 647–652.
[24] A. Tang and N. K. Jha, “GenFin: Genetic algorithm-based multiobjective
statistical logic circuit optimization using incremental statistical analy-
R EFERENCES sis,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 3,
[1] A. Kapoor et al., “Digital systems power management for high per- pp. 1126–1139, Mar. 2016.
formance mixed signal platforms,” IEEE Trans. Circuits Syst. I, Reg. [25] D. Liu and C. Svensson, “Power consumption estimation in CMOS
Papers, vol. 61, no. 4, pp. 961–975, Apr. 2014. VLSI chips,” IEEE J. Solid-State Circuits, vol. 29, no. 6, pp. 663–670,
[2] D. Duarte, N. Vijaykrishnan, and M. Irwin, “A clock power model Jun. 1994.
to evaluate impact of architectural and technology optimizations—A [26] A. Bonetti, N. Preyss, A. Teman, and A. Burg, “Automated integration of
summary,” IEEE Circuits Syst. Mag., vol. 3, no. 3, pp. 36–39, Jul. 2003. dual-edge clocking for low-power operation in nanometer nodes,” ACM
[3] A. Vittal and M. Marek-Sadowska, “Low-power buffered clock tree Trans. Design Autom. Electron. Syst., vol. 22, no. 4, pp. 62:1–62:20,
design,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., May 2017, doi: 10.1145/3054744.
vol. 16, no. 9, pp. 965–975, Sep. 1997. [27] A. Farshidi, L. Behjat, L. Rakai, and D. Westwick, “A multiobjective
[4] V. Sharma, “Minimum current consumption transition time optimization cooptimization of buffer and wire sizes in high-performance clock trees,”
methodology for low power CTS,” in Proc. Design, Autom. Test Eur. IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 64, no. 4, pp. 412–416,
Conf. Exhib. (DATE), Mar. 2015, pp. 412–416. Apr. 2017.
[5] C. Deng, Y. Cai, and Q. Zhou, “Fast synthesis of low power clock trees [28] S. Pullela, N. Menezes, and L. T. Pillage, “Low power IC clock
based on register clustering,” in Proc. 16th Int. Symp. Quality Electron. tree design,” in Proc. IEEE Custom Integr. Circuits Conf., May 1995,
Design, Mar. 2015, pp. 303–309. pp. 263–266.
[6] A. Farshidi, L. Rakai, and L. Behjat, “An efficient optimal clock network [29] Q. Wu, M. Pedram, and X. Wu, “Clock-gating and its application to
buffer sizing with slew consideration,” in Proc. IEEE 30th Can. Conf. low power design of sequential circuits,” IEEE Trans. Circuits Syst. I,
Electr. Comput. Eng. (CCECE), Apr. 2017, pp. 1–4. Fundam. Theory Appl., vol. 47, no. 3, pp. 415–420, Mar. 2000.
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
KWON et al.: VIRTUAL TILE-BASED FLIP-FLOP ALIGNMENT METHODOLOGY FOR CLOCK NETWORK POWER OPTIMIZATION 13
[30] G. E. Tellez, A. Farrahi, and M. Sarrafzadeh, “Activity-driven clock David Z. Pan (Fellow, IEEE) received the B.S.
design for low power circuits,” in Proc. IEEE Int. Conf. Comput.-Aided degree from Peking University, Beijing, China, in
Design (ICCAD), Nov. 1995, pp. 62–65. 1992, and the M.S. and Ph.D. degrees from the Uni-
[31] J. Yan and Z. Chen, “Construction of constrained multi-bit flip-flops versity of California at Los Angeles (UCLA), Los
for clock power reduction,” in Proc. Int. Conf. Green Circuits Syst., Angeles, CA, USA, in 1998 and 2000, respectively.
Jun. 2010, pp. 675–678. From 2000 to 2003, he was a Research Staff
[32] Z. Chen and J. Yan, “Routability-driven flip-flop merging process for Member with IBM T. J. Watson Research Center,
clock power reduction,” in Proc. IEEE Int. Conf. Comput. Design, Yorktown Heights, NY, USA. He is currently an
Oct. 2010, pp. 203–208. Engineering Foundation Professor with the Depart-
[33] T. Lee and T. Wang, “Congestion-constrained layer assignment for via ment of Electrical and Computer Engineering, The
minimization in global routing,” IEEE Trans. Comput.-Aided Design University of Texas at Austin, Austin, TX, USA.
Integr. Circuits Syst., vol. 27, no. 9, pp. 1643–1656, Sep. 2008. He has published over 350 journal articles and refereed conference articles
[34] W.-H. Liu and Y.-L. Li, “Negotiation-based layer assignment and is the holder of eight U.S. patents. His research interests include cross-
for via count and via overflow minimization,” in Proc. 16th layer nanometer IC design for manufacturability, reliability, security, machine
Asia South Pacific Design Autom. Conf. (ASPDAC). Piscataway, learning and hardware acceleration, and design/CAD for analog/mixed signal
NJ, USA: IEEE Press, 2011, pp. 539–544. [Online]. Available: designs and emerging technologies.
http://dl.acm.org/citation.cfm?id=1950815.1950924
Authorized licensed use limited to: NXP Semiconductors. Downloaded on March 02,2020 at 08:41:53 UTC from IEEE Xplore. Restrictions apply.