CDC Verification
CDC Verification
Chapter Introduction
Clock domain crossing (CDC) has become an ever-increasing problem in multi-clock
domain designs. One must solve issues not only at RTL level but also consider the
physical timing. This chapter will start with understanding of metastability and then
dive into different synchronizing techniques. It will also discuss the role of SystemVerilog
Assertions in verification of CDC. We will then discuss a complete methodology.
There are hardly any designs today that operate on a single clock. A typical SoC will
have three or more clocks, and these will be asynchronous. We have all done CDC
checks using lint tools, among others. But the problem is that there is a disconnect
between RTL static or simulation-based analysis and what we see in the physical
chip. The issue of metastability due to clock domain crossing is not very predictable
at RTL or gate level. Therefore, simulation does not accurately predict silicon
behavior, and critical bugs may escape the verification process. This results in
almost 25% of all respins due to clocking issues, CDC being the chief among them.
Here’s an example of typical real-life designs and the number of clocks and CDC
signals they have. This is just a representative data point [ (PING YEUNG PH.D.)].
This table goes to show the complexity of CDC verification. Both single-bit and
multi-bit synchronizations need to take place.
8.2 Metastability
The main culprit in CDC is the metastability of data that occurs when data crosses
from one clock domain to another. The first can be slower or faster compared to the
other clock domain. The data that crosses the boundary can end up violating setup/
hold requirements of the second clock domain. This is explained via Fig. 8.1. This
figure shows a synchronization failure that occurs when a TxData generated in
TxClk clock domain is sampled too close (setup violation) to the rising edge of
RxClk of the Rx logic domain. Synchronization failure is caused by an output going
metastable and not converging to a legal stable state.
When TxData violates setup time of the RxClk, RxData goes metastable, mean-
ing we don’t know what state will it settle down to or settle down at all within one
clock. If TxData is held “long” enough, RxData will eventually become stable and
end up in a correct state. For the sake of simplicity, I’ve shown the metastable
RxData to stabilize in one clock. But that may not necessarily be the case in all
instances. If the metastable RxData is fed directly into the forward logic, you do not
8.3 Synchronizer 151
know what metastable state got propagated to the forward logic. Since the CDC
signal can fluctuate for some period of time, the input logic in the receiving clock
domain might recognize the logic level of the fluctuating signal to be different val-
ues and hence propagate erroneous signals into the receiving clock domain. In RTL
simulation, this metastable state will be regarded as “X” (unknown) state (correctly
so), and the logic beyond RxDFF may be rendered useless (i.e., “X” propagation
will cause all sorts of issues in the logic).
In short, synchronization failure is caused by an output going metastable and not
converging to a legal stable state by the time the output must be sampled again.
8.3 Synchronizer
second flop will have a stable value and can propagate to the rest of the design
without unpredictability. Please refer to Fig. 8.3 to understand this scenario. To reit-
erate, the first flip-flop samples the asynchronous input signal into the new clock
domain and waits for a full clock cycle to permit any metastability on the stage-1
output signal to decay, and then the stage-1 signal is sampled by the same clock into
a second-stage flip-flop, with the intended goal that the stage-2 signal is now a sta-
ble and valid signal synchronized and ready for distribution within the new clock
domain.
A couple of implementation guidelines for the two-flop synchronizer:
1. There should not be any combinational logic between the Transmit DFF and the
Receive DFF. This allows for maximum metastability resolution time.
2. RxDFF1 and RxDFF2 synchronizer flops should be placed as close as possible
during layout. Most companies nowadays offer a predefined, laid out, and veri-
fied synchronizer macros which can be hand placed in RTL.
For some very high-speed designs, the mean time between failure (MTBF) is too
short since the data may change before the second flop synchronizes the TxData. In
such cases, you may need three-flop synchronizers to compensate for the high
speed. Metastability may not settle down at RxDFF2 (Rx2Data) and hence the need
for the third flop (RxDFF3) (Fig. 8.4).
8.3 Synchronizer 153
TxClk
RxClk Rx Logic
So far, we have seen synchronizers that work when both the transmit and the receive
clocks are of the same frequency. Note that if the transmit clock is slower than the
receive clock, the two (or three) flop synchronizers will work quite well. Recognizing
that sampling slower signals into faster-clock domains causes fewer potential prob-
lems than sampling faster signals into slower-clock domains, a designer might want
to take advantage of this fact by using simple two flip-flop synchronizers to pass
single CDC signals between clock domains.
But when the transmit clock is faster than the receive clock, there is the possibil-
ity that a signal from the transmit logic may change values twice before it can be
sampled or might be too close to the sampling edge of the slower receive clock
domain.
For the ensuing discussion, let us call the signal that needs synchronization as the
CDC signal. That will make it easier to describe the concept. Here’s the two-flop
synchronization (Fig. 8.5) for ease of reference.
154 8 Clock Domain Crossing (CDC) Verification
TxClk
TxData
RxClk
Rx1Data
RxData
Fig. 8.6 Faster transmit clock to slower receive clock—two-flop synchronizer won't work
TxClk
TxData
RxClk
Rx1Data
RxData
Fig. 8.7 Lengthened transmit pulse for correct capture in receive clock domain
If the CDC signal is only pulsed for one fast-clock cycle, the CDC signal could
go high and low between the rising edges of a slower clock and not be captured into
the slower-clock domain. This is shown in Fig. 8.6. In this figure, TxData goes high
and then goes low (1 high pulse) in between the RxClk period. In other words, this
high pulse will not be captured by the RxClk. That results into the Rx1Data remain-
ing at the previously captured state of “0” and so does RxData. The high pulse on
TxData is dropped by the receive logic which will result in incorrect behavior in the
receive logic.
Hence, a two-flop synchronizer won’t work when the transmit clock is faster
than the receive clock.
One potential solution to this problem is to assert the TxData signal (i.e., the
CDC signal) for a period that exceeds the cycle time of the receive clock. This is
shown in Fig. 8.7. The general rule of thumb is that the minimum pulse width of the
transmit signal be 1.5x the period of the receive clock frequency. The assumption is
8.3 Synchronizer 155
that the CDC signal will be sampled at least once by the receive clock. The issue
with this solution will arise if an engineer mistakes this solution to be a general-
purpose solution and miss the transmit (CDC) signal period requirement. This is
where SystemVerilog Assertions come into picture. Put an assertion on the CDC
signal for its period check when crossing from the high-frequency to the low-
frequency domain.
There are other solutions to tackle this problem, which are beyond the scope of
this book.
When passing multiple signals between clock domains, simple synchronizers do not
guarantee safe delivery of the data. A frequent mistake made by engineers when
working on multi-clock designs is passing multiple CDC bits required in the same
transaction from one clock domain to another and overlooking the importance of the
synchronized sampling of the CDC bits.
The problem is that multiple signals that are synchronized to one clock will
experience small data-changing skews that can occasionally be sampled on different
rising clock edges in a second clock domain. Even if we could perfectly control and
match the trace lengths of the multiple signals, differences in rise and fall times as
well as process variations across a die could introduce enough skew to cause sam-
pling failures on otherwise carefully matched traces.
Here are a couple of solutions to solve the multi-bit synchronization problem.
In-depth discussion of these solutions is out of scope of this book, but I highly rec-
ommend a SNUG paper by Cliff Cummings mentioned in the Bibliography (Clifford
E. Cummings).
1. The Gray Code Solution Where Multiple CDC Bits Are Passed Using Gray
Codes
The safest counters that can be used in multi-clock designs are Gray Code coun-
ters. Gray Codes only allow one bit to change for each clock transition, eliminating
the problem associated with trying to synchronize multiple changing CDC bits
across a clock domain. Standard Gray Codes have very nice translation properties
to convert gray to binary and back again. Using these conversions, it is simple to
design efficient Gray Code counters.
I am sure we are familiar with Binary to Gray and Gray to Binary code conver-
sion formulas. But they are presented here for the sake of completeness.
4-bit Gray to Binary conversion:
The Gray Code counters are used in this asynchronous FIFO design for the Read_
pointer and the Write_pointer guaranteeing successful transfer of multi-bit data
from write clock (aka the transmit clock) to read clock (aka the receive clock). Let
us look at an asynchronous FIFO design that uses Gray Code counter.
module asynchronous_fifo (
// Outputs
fifo_out, full, empty,
// Inputs
wclk, wclk_reset_n, write_en,
rclk, rclk_reset_n, read_en,
fifo_in
);
input wclk_reset_n;
input rclk_reset_n;
input wclk;
input rclk;
input write_en;
input read_en;
input [D_WIDTH-1:0] fifo_in;
reg full;
reg empty;
// check full
always @ (posedge wclk or negedge wclk_reset_n)
if (!wclk_reset_n)
{rd_ptr_gray_wclk_q2, rd_ptr_gray_wclk_q} <= #`FF_DLY {{A_
WIDTH+1{1'b0}}, {A_WIDTH+1{1'b0}}};
else
{rd_ptr_gray_wclk_q2, rd_ptr_gray_wclk_q} <= #`FF_DLY {rd_
ptr_gray_wclk_q, rd_ptr_gray};
// check empty
always @ (posedge rclk or negedge rclk_reset_n)
if (!rclk_reset_n)
{wr_ptr_gray_rclk_q2, wr_ptr_gray_rclk_q} <= #`FF_DLY {{A_
WIDTH+1{1'b0}}, {A_WIDTH+1{1'b0}}};
else
{wr_ptr_gray_rclk_q2, wr_ptr_gray_rclk_q} <= #`FF_DLY {wr_ptr_
8.4 CDC Checks Using SystemVerilog Assertions 159
gray_rclk_q, wr_ptr_gray};
endmodule
In the next section, we will see how to use SystemVerilog Assertions to make
sure that data are not dropped when write data (on write clock) are transferred
through Gray Code counter synchronization logic to read data (on read clock).
As we saw, in Chap. 6, SystemVerilog Assertions (SVA) are a great way to check for
sequential domain conditions at clock (or sampling edge) boundaries. The CDC
signals crossing from one clock domain to another are perfect candidates to check
for using SVA. SVA fully supports multi-clock domain assertions as well as multi-
threaded local variables to make full proof checkers to see that your CDC synchro-
nizers (whatever the design style) work as promised. Note that the assertions
presented here can be used both for simulation-based checking and formal-based
checking (static functional). But I will focus on simulation-based checking since the
formal/static functional is still not fully adopted by many engineering groups and
requires a complete chapter in itself.
Let us start with the simplest of the design. Later we will see a comprehensive
assertion for CDC multi-bit data transfer using the Gray Code counter-based asyn-
chronous FIFO described above.
Here’s a wonderful two-flop synchronizer repeated for the sake of convenience.
160 8 Clock Domain Crossing (CDC) Verification
property TxData_stable;
@(posedge Txclk) $changed(TxData) |=> $stable(TxData) [*2];
endproperty
Let us now see how to make sure that this two-flop single-bit syn-
chronizer correctly transfers data so that RxData === TxData after
metastability filter:
property Tx_to_Rx_CDC_DataCheck;
local Data;
First, the assertion checks that TxData has changed at posedge of TxClk. If it has,
we first store the TxData into the multi-threaded local variable Data. 1’b1 is required
because local data store must be attached to an expression. Since we don’t have any
condition, we simply say “always true” is the expression. “Always true” means
always store TxData into the data, whenever TxData changes. Then, we check at the
CDC boundary clock RxClk that the data has indeed transferred to Rx1Data by
comparing Rx1Data with the stored TxData (in the data). One clock later, the
RxData must match the TxData that was transmitted on TxClk. This guarantees that
the CDC 1-bit two-flop synchronization works as intended. Again, note that the
assumption of TxClk faster than RxClk must be adhered to.
As an exercise, see if you can write a simple assertion to check for glitch on
TxData. The above solution assumes no glitch on TxData.
Ok, now let us write a comprehensive assertion for a multi-bit Gray Code
counter-based data transfer across CDC region. This assertion is written for the
asynchronous FIFO design shown in Sect. 3.5. The write data are written to fifo_in
on wclk (write clock); and read from fifo_out on rclk (read clk). The assertion has
to make sure that whatever data were written into FIFO at the write pointer, the
same data is read out from FIFO when read pointer is equal to the write pointer:
8.5 CDC Verification Methodology 161
sequence rd_detect(ptr);
##[0:$] (read_en && !empty && (aff1.rd_ptr == ptr));
endsequence
property data_check(wrptr);
integer ptr, data;
@ (posedge wclk) disable iff (!wclk_reset_n || !rclk_reset_n)
(write_en && !full, ptr=wrptr, data=fifo_in,
$display($stime,"\t Assertion Disp wr_ptr=%h data=%h”, aff1.
wr_ptr, fifo_in))
|=>
@ (negedge rclk) first_match(rd_detect(ptr),
$display($stime,,," Assertion Disp FIRST_MATCH ptr=%h Compare
data=%h fifo_out=%h", ptr, data, fifo_out))
##0 (fifo_out === data);
endproperty
In this assertion, data_check property checks to see if FIFO is not full. If so, saves
wr_ptr into the local variable “ptr” and the data from FIFO into local variable “data” and
display so that we can easily see how the assertion is progressing during simulation.
If the antecedent is true, the consequent says that the first match of rd_ptr being
the same as wr_ptr (note wr_ptr was stored in local variable ptr) and that the read
data is the same as the write data (note write data were stored in local variable data
in the antecedent).
Sequence rd_detect(ptr) is used as an expression to first_match. It says that wait
from now until forever until you detect a read, and its rd_ptr is equal to the wr_ptr
(which is stored in the local variable “ptr” in the antecedent).
Many such assertions can be written to see that your synchronizer design works.
As an exercise, try writing simple assertions for your synchronizer design.
Metastability from the intermixing of multiple clock signals is not accurately mod-
eled by simulation. Unless you leverage exhaustive, automated clock domain cross-
ing (CDC) analyses to identify and correct problem areas, you will inevitably suffer
unpredictable behavior when the chip samples come back from the fab. Bottom line:
automated CDC verification solutions are mandatory for multi-clock designs.
162 8 Clock Domain Crossing (CDC) Verification
Traditional CDC verification methods include manually inspecting RTL code for
the presence of synchronizers, running full timing simulations, sweeping clocks
against each other, and using special simulation models to randomly vary the delays
through synchronizers. These methods find only a subset of errors in a given design.
An effective CDC verification methodology should include structural, protocol,
and re-convergence fanout verification [ (PING YEUNG PH.D.)].
Structural Verification
Each synchronizer must have the correct structure for the type of signal being sent
across clock domains. For example, a 2-DFF synchronizer is usually the best solu-
tion for single-bit signals but should not be used for multi-bit signals unless they are
gray-coded to ensure that only one bit changes at a time. Multi-bit signals may be
synchronized across domains using a separate control signal, an asynchronous
FIFO, or other methods. Also, there should be no combinational logic inside or
before a synchronizer.
Protocol Verification
Each synchronizer must follow a set of rules, called a transfer protocol, to ensure
that the CDC signal is properly transferred across clock domains. For example, even
the simplest 2-DFF synchronizer requires that the transmitting signal be held stable
long enough to guarantee that it is captured in the receiving domain. This may not
occur if the transmitting clock is faster than the receiving clock. Synchronization
structures for multi-bit signals require more complex protocol checks. When CDC
transfer protocols are violated, an error may not occur in simulation but will eventu-
ally occur in real hardware. Protocol analysis should be done using static formal
methods. SVA should be deployed to check for correct protocol adherence.
Re-convergence Fanout Verification (Fig. 8.8)
Re-convergence occurs when multiple signals are synchronized separately from one
clock domain to another and then used by the same logic in the receiving domain. If
that logic assumes a timing relationship between the signals, the design is not
tolerant of metastability and will eventually fail. This is because the purpose of
synchronizers is to “filter out” metastability to ensure that unpredictable values are
not seen by the receiving logic.
STRUCTURAL analysis
RTL Sta c Formal
Protocol
Asser ons Result Debug
Database
Let us see how we can combine structural analysis with protocol analysis to
come up with an automated comprehensive methodology. The following is a generic
diagram representing the automated process many EDA vendors now provide.
Figure 8.9 shows a proposed methodology. EDA vendors have implemented similar
methodology (or are working toward).
Identify RTL blocks (not the entire SoC RTL) that have CDC signals at play. Feed
such RTL blocks to the static formal structural analysis tool. This tool will identify
CDC synchronization “structures” within your logic and analyze to see if they meet
the requirements. For example, a single-bit CDC synchronization will work with a
two-flop synchronizer. But for a multi-bit synchronizer, the two-flop solution won’t
work. You may need an asynchronous FIFO-based solution or a gray counter (where
only 1 bit changes at a time). The tool will analyze such situations and provide a
structural analysis report. The results are also stored in a UCDB style database for
further debug analysis. This step should find issues with missing and incorrectly
implemented synchronizers and potential re-convergence problems.
More important in this step is to automate derivation of SystemVerilog Assertions.
For example, for a two-flop synchronizer, the input data should remain stable for at
least 1.5x the receive clock. The structural analysis tool will (should) automatically
write such assertions for the next stage of protocol verification. There are many such
164 8 Clock Domain Crossing (CDC) Verification
Once the structural analysis is complete, the assertions (either automatically created
or manually) will be input to the protocol analyzer. The static formal method employed
in the protocol analyzer will try all possible combinations of inputs (both in temporal
and combinational domain) to the RTL block and see if any of the assertions
FAIL. These assertions ensure that the CDC signal is stable when going from the TX
to the RX domain; the multiple-bit CDC data is gray coded, or it is stable when it is
sampled. The results will show failures which need to be analyzed to correct the syn-
chronizer. Multiple iterations of this step will make sure that the logic will survive
under all conditions of input and that the metastability has been addressed.
In addition to static formal, you may also want to simulate using the created asser-
tions. For example, you feel comfortable with sweeping clocks to check for re-conver-
gence logic. Or you want to deploy the so-called static + simulation hybrid methodology
to check for the structural integrity against required protocol specification.
Of course, debug is a big part of this strategy. The results from structural and proto-
col analysis are stored in an UCDB style database. The debug tool will associate the
structure against the protocol and show the relationship. It will also help you debug
failing assertions. EDA tools do support such debug capabilities.
Based on the debug results, you will either change the RTL or change the input
test vectors and metastability injection strategy.
This loop will continue until there are no more assertions that fire and the meta-
stability issues are completely resolved.
This is what I call a proposed methodology. You may discuss it with EDA ven-
dors to see how close they come to it with their proposed solution.
The next problem is CDC at gate level. Gate-level simulations are notorious in
propagating an unknown “X,” rapidly throughout the design. The two-flop synchro-
nizer can cause the “X” propagation problem. See Fig. 8.10 to understand this issue.
8.7 EDA Vendors and CDC Tools Support 165
TxClk
RxClk
TxData
Setup Violation
Rx1Data Metastable Region going ‘X’
So what kind of industry tools are available to help a DV engineer tackle CDC veri-
fication? Synopsys SpyGlass CDC and Mentor’s Questa CDC are two of the many
tools available in the EDA market. I’ve described only Mentor’s solution. Synopsys
does not provide information on their SpyGlass CDC tool unless you register. So do
not go there.
166 8 Clock Domain Crossing (CDC) Verification
8.7.1 Mentor