0% found this document useful (0 votes)

21 views6 pages

Ullah 2015

The document presents UE-TCAM, an ultra-efficient SRAM-based ternary content-addressable memory (TCAM) architecture designed to address the limitations of traditional TCAMs, such as high power consumption, low storage density, and complexity. The proposed design achieves significant reductions in resource utilization, energy consumption, and latency, while improving speed and throughput, making it suitable for various applications including networking and pattern recognition. The architecture utilizes hybrid partitioning to optimize memory usage and performance, demonstrating a promising alternative to existing SRAM-based TCAM designs.

Uploaded by

Tường Vy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views6 pages

Ullah 2015

Uploaded by

Tường Vy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

UE-TCAM: An Ultra Efficient SRAM-based TCAM

4
Zahid Ullah(1), Manish K. Jaiswal(2), Ray c.c. Cheung(3), and Hayden K.H. SO( )
l
Department of Electrical Engineering, CECOS University of IT and Emerging Sciences, Peshawar, Pakistan( )
(
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong , 2 4)
Department of Electronic Engineering, City University of Hong Kong, Hong Kong(3 )
Emails:[email protected](L)[email protected](2)[email protected](3 )[email protected](4)

Abstract-Ternary content-addressable memories (TeAMs) the TCAM size increases, which results in a prohibitive power
are high speed memories; however, compared to static random consumption, size, and cost; thus, nullifying its advantage of
access memories (SRAMs), TeAMs suffer from low storage high-speed lookup. The comparison circuity in each cell not
density, relatively slow access time, poor scalability, complexity only makes TCAM expensive but also adds complexity to the
in circuitry, and higher cost. To access the benefits of SRAM,
TCAM architecture. The extra logic and capacitive loading
several SRAM-based TeAMs, specifically on field-programmable
due to the massive parallelism lengthen the access time of
gate arra y (FPGA) platforms, were proposed. To further improve
the performance of SRAM-based TeAMs, this paper presents
TCAM, which is over 3.3 times longer than the access time
UE-TeAM, which reduces memory requirement, latency, power of SRAM [3].
consumption, and improves speed. An example design of 512 x 36
Furthermore, TCAM is not subjected to the intense com
of UE-TeAM has been implemented on Xilinx Virtex-6 FPGA.
mercial competition found in the RAM market [4] and yet
Performance evaluation confirms a significant improvement in
the proposed UE-TeAM, which achieves 100% reduction in 18K
to gain a substantial market share. TCAMs are expensive not
B-RAMs, 74.67% reduction in SRs, 70.28% reduction in LUTs, only due to their low memory cell density but also due to
75.76% reduction in energy -delay product, and 60% reduction their insignificant market demand, which means they are not
in latency and improves speed by 70.85%, compared with the produced in mass to drive their cost down. The cost of TCAM
available SRAM-based TeAM. is about 30 times more per bit of storage than the SRAM [3].
In addition, inherited architectural barriers also limit its total
I. INTRODUCTION chip capacity. Complex integration of memory and logic also
makes TCAM testing very time consuming [2].
Ternary content-addressable memory (TCAM) provides
access to stored data by contents (data word) rather than by CAMs have limited pattern retrieval capacity and also
an address and outputs the match address. CAM searches its CAM technology does not evolve as fast as the RAM tech
entire memory concurrently to check if that data word is stored nology. RAM technology is driven by many applications, par
anywhere in CAM memory. CAM returns a list of one or ticularly computers and consumer electronic products; hence,
more storage addresses where the word was found. The fast cost per bit continuously decreases, as opposed to the CAM
search feature is the main influence behind using a CAM. The technology, which is considered specialized and only a modest
search operation can also be performed in regular random increase in bit capacity and a modest decrease in cost may
access memory (RAM) by iteratively reading and comparing be expected in future [5]. TCAM does not scale well in
entire RAM entries for every search request. As a result, the terms of clock rate, power consumption, or chip density
search time using RAM is significantly longer than the CAM whereas SRAM is scalable and less complex. The throughput
for the same search request. of classical TCAMs is also limited by the relatively low speed
of TCAMs [6].
The high-speed search operation makes CAM an attractive
choice for applications requiring high-speed search such as
B. Motivations and contributions
local-area network, databases management, pattern recogni
tion, and artificial intelligence [1]. Recent applications include Field-programmable gate arrays (FPGAs) have a wide use
real-time pattern matching in virus-detection and intrusion in different applications [7] such as in image processing [8],
detection systems, gene pattern searching in bioinformatics, [9], networking systems [10], [11], and cryptography com
data compression, and image processing [2]. putations [12], [13] owing to several benefits such as its
reconfigure-ability, massive hardware parallelism, and rapid
A. Problem statement prototyping capability. SRAM-based FPGAs such as Xilinx
Virtex-6 and Virtex-7 [14] provide high clock rate and a large
Although CAM technology presents a major advantage of
amount of on-chip dual-port memory with configurable word
a deterministic comparison in a constant time over standard
width. Xilinx Virtex-7 2000T FPGA is ideally suited for the
RAM, yet it also has shortcomings. For parallel search op
application-specific integrated circuit (ASIC) prototyping.
eration, CAM needs comparison circuitry in each cell, which
dictates that CAM density lags RAM density. Typical TCAM The Virtex-7 2000T provides equivalent capacity and per
cell has two SRAM cells and a comparison circuitry. A table of formance to high density ASICs, reduces board space re
size 211 x w needs 211+ 1 X w SRAM cells and 211 x w comparison quirements and complexity, and furthermore, reduces system
circuitries, with one for each TCAM cell. For large values of n, level power consumption The current FPGA technology does
978-1-4799-8641-5/15/$31.00 ©2015 IEEE
not have hard IPs for the classical TCAMs; however, it has impact on the performance of the RAM-based CAM in [4].
for SRAMs. Benefits of SRAM over CAM and feasibility With increase in the number of stored elements, performance
of FPGA technology have motivated us to go for innovative of the method becomes gracefully degradable. Further, the
designs of TCAM. method emulates Binary CAM not the TCAM.

The proposed UE-TCAM architecture is build on the suc The method in [18] also exploits hashing technique for
cession of the prior work on HP-TCAM [15], Z-TCAM [16], TCAM. Being based on hashing technique, it also suffers from
and E-TCAM [17]. The proposed work in the paper makes the collisions and bucket overflow, which needs additional area. If
following key contributions. the overflow area has many records, then a search operation
may not finish until many buckets are searched. Furthermore,
• Architecture of the proposed TCAM is much simpler, when stored keys contain don't care bits in the bit positions
which consists of primarily SRAM units with simple used for hashing, then such keys must be duplicated in multiple
additional logic and is implemented on state-of-the-art buckets, which results in large memory; thus, the memory
Xilinx FPGA. utilization is not efficient.
• The proposed UE-TCAM brings an enormous reduc Hashing technique also cannot provide deterministic per
tion in resource utilization. Implementation results formance due to potential collisions and is inefficient in han
illustrates that our UE-TCAM attains 100% reduc dling wild-card [19]. In contrast to the hashed-based CAMs,
tion in 18K B-RAMs, 74.67% reduction in SRs, the proposed TCAM provides a deterministic search perfor
and 70. 28% reduction in LUTs, compared with the mance and efficiently utilizes memory. SRAM-based pipelined
available SRAM-based TCAM. CAMs also take multiple clock cycles to accomplish a search
operation and the memory utilization is also not efficient [20].
• Energy/bit/search is a very useful performance metric
In contrast, our proposed TCAM has a deterministic through
for TCAM. Compared with the existing SRAM-based
put of a single clock cycle and also provides a better utilization
TCAM, the proposed TCAM gets 58.58% reduction
of memory.
in energy consumption.
RAM-based CAMs in [5] and [21] also have unavoidable
• Latency is another important performance metric. The
shortcomings. Size of memory in both methods depends on
UE-TCAM also contributes by reducing latency 60%,
the number of bits (nob) in TCAM word. In [5], the required
compared with the available SRAM-based TCAM.
memory size would be 2110b bits arranged in a column. Size
• Compared with the state-of-the-art SRAM-based increases exponentially with increase in the number of bits in
TCAM design, the UE-TCAM also improves speed by TCAM word. For instance, 36 bits word needs a 64 GB of
70.85%. Getting higher throughput with much simpler RAM. Such a huge memory results in prohibitive area, cost,
architecture is a beauty of the proposed work. and power consumption; thus, it makes the method practically
infeasible for an arbitrarily large bit pattern. Whereas, the pro
The proposed work may be used in network systems, posed design has a suitable partitioning scheme and efficiently
web-enabled applications, and also in cloud computing. Other supports arbitrarily large words.
applications that can benefit from the proposed TCAM are data
compression, image recognition processors, voice recognition In [21], increase in the number of bits in CAM word
processor, or any pattern recognition system in general. We exponentially increases the memory size to a prohibitive limit,
expect that CAM technology will become main-stream for like [5]. Furthermore, RAM-based CAM in [21] works only
many applications in the near future. Thus, the use of CAM on data arranged in ascending order, which is against the
technology paves the way for our proposed work in the norm of a real application where data are totally random. To
emerging applications. arrange data in ascending order, the original order of entries
needs to be preserved, which is not considered in this method.
However, if considered, the memory and power requirements
C. Paper organization will further increase. In contrast, our proposed TCAM supports
The rest of the paper is organized as follows: Section II an arbitrarily large bit pattern, preserved original addresses,
discusses related work. Section III explains hybrid partitioning, and also a suitable partitioning methodology.
which realizes architectures of the SRAM-based TCAMs. CAM in [22] integrates CAM and RAM to get overall
Section IV presents architecture of the proposed UE-TCAM. CAM functionality; thus, inherits the inborn disadvantages
Section V explains UE-TCAM operations. Section VI elabo of CAM. This scheme arranges traditional TCAM table into
rates operations of the proposed TCAM with examples. Sec groups based on some distinguishing bits in TCAM words.
tion VII provides implementation and performance evaluation So each group can have at most one possible match. Since
of the UE-TCAM. Section VIII concludes the paper and also data in real applications are totally random, making groups
highlights our future work. would be very time consuming. On the contrary, the proposed
method provides a generic TCAM and uses SRAM, not CAM,
II. RELATED WORK to emulate over all TCAM functionality.
We surveyed the literature on RAM-based CAMs and to State-of-the-art SRAM-based TCAMs-HP-TCAM [15],
the best of our knowledge, we found very few works on it. Z-TCAM [16], and E-TCAM [17] are recently published.
RAM-based CAM proposed in [4] uses hashing technique; Our proposed UE-TCAM improves them by lowering memory
thus, inherits the inborn disadvantages of hashing-collisions size, power consumption, and latency and more importantly
and bucket overflow. Number of stored elements has a great provides higher throughput.
N vertical partitions Inpulword C

· 1 1
HP1N Partition input word of C bits into N subwords; with each subword is of w bits

.SWN

.r;! w-bit w-bit w-bit w-bit w-bit w-bit

L layers
�
r;! r;! r;! r;! . .r;!
��L2J� �
CAM Priority Encoder

· 1 1
HPLN
Fig. 2. Architecture of UE-TCAM. Layer architecture is shown in Fig. 3.
(L: # of layers, sw: subword, w: # of bits in subword, C: # of bits in the input
word, PMA: potential match address, and MA: match address).
Fig. 1. Conceptual view of hybrid partitioning (HP). (L: # of layers, N: # of
vertical partitions).

III. HYB RID PARTITIONING

We use hybrid partitioning (HP), shown Fig. 1, to di

vide conventional TCAM table horizontally and vertically to
construct hybrid partitions. Vertical partitioning (VP) in HP
divides TCAM word of C bits into N subwords. Horizontal
partitioning (HrP) in HP divides each vertical partition into
L horizontal partitions by using the original address range of
conventional TCAM table. Thus, HP results in a total of L x
N hybrid partitions. Dimensions of each hybrid partition are
K x w where K is a subset of original addresses and w is the
number of bits in a subword. Fig. 3. Architecture of a layer of UE-TCAM. (N: # of subwords, LPE: layer
priority encoder, K: width of SRAM unit, (sw: subword, w: # of bits in a
VP is used to use as lower memory as possible. HrP cannot subword, PMA: potential match address, and MA: match address).
be used alone because it needs very huge memory size. Thus,
HrP is not feasible owing to inefficiency in terms of area, TABLE 1. COMPOSITION OF THE SRAM UNIT IN UE-TCAM

power, and cost; however, it nicely generates layers. Hybrid

partitions that span the same address range are grouped in the
Addresses I O'n I I"
Original address positions
2nn 3' I 4,n I ... (K_I)'n
same layer. For example, HP3 ], HP32 , HP33 , . . . , and HP3 N are 0 1 I 0 1 0 ... I
I 1 I I 1 0 ... I
in layer 3. 0 I I
3 1 1 ... 0
4 0 0 0 0 1 0

IV. ARCHI T ECTURE OF UE-TCAM

2"" I I
- 1 0 0 1 1 ...
A. Overall architecture
Fig. 2 shows the overall architecture where each layer
represents the layer architecture given in Fig. 3. UE-TCAM 2) K-bit AND operation: K bits rows are read out by their
has L layers and a CAM priority encoder (CPE). Output of corresponding subwords, which are then bit-wise ANDed and
each layer is a potential match address (PMA). The PMAs are the result is then forwarded to LPE for further processing.
fed to CPE, which selects match address (MA) among PMAs. Possible PMA is present among the result of K-bit AND
operation. The result is then forwarded to LPE for result
generation in the form of PMA.
B. Layer architecture
3) Layer priority encoder: Since we emulate TCAM and
Layer architecture of the proposed TCAM is illustrated in as in TCAM multiple matches may occur [23], LPE is used
Fig. 3. Its components include N SRAM units, K-bit AND to select PMA in the output of K-bit AND operation.
operation, and a layer priority encoder (LPE).

1) SRAM unit: Each SRAM unit has a size of 2w_ V. UE- TCAM OPERATIONS
wordsxK-bit where K is the subset of original addresses from
A. Data mapping operation
conventional TCAM. Maximum possible combinations of w
bits are 2w where each combination represents a subword and Tradition TCAM table is logically partitioned column-wise
in our proposed TCAM, each subword acts as an address to (vertically) and row-wise (horizontally) into TCAM sub-tables
its corresponding SRAM unit that invokes its corresponding using hybrid partitioning [15]. A partition may contain an x
row of K bits. Composition of the SRAM unit in the proposed bit, which is first expanded into binary bits (0 and 1). Each
architecture is shown in Table I where 1 shows the presence subword, acting as an address, is applied to its corresponding
of a subword at an original address. SRAM unit and K bits are written at the memory location.
TABLE II. TRADITIONAL TCAM TABLE WITH HYBRID PARTITIONS TABLE IV. SEARCHING IN LAYER 1 AND LAYER 2 IN UE-TCAM

Address Hybrid partitions Layer I Steps I Activity Layer I Layer 2

0 00 II SubworddI - 00 Subword2 - II
I 01 HP" 01 HPI2 I 10 10
2 Ox II I Read out data from:
Read out data from:
SRAM unitll
SRAM Unitl2
-

= \0
SRAM unit21
SRAM unit22
-

= II
3 II HP21 Ix HP22 2 2 K-bit ANDing result: 10 10
3 PMAs PMAI - 0 PMA2 - 2
TABLE Ill. DATA MAP PING EXAMPLE: SRAM UNITS IN LAYER 1 AND
LAYER 2 OF UE-TCAM TABLE V. OVERALL DATA SEARCH OPERATION IN UE-TCAM

Original addresses Steps I Activity

Layer I Layer 2 00II
Address
SRAM unit" SRAM unit12 SRAM unit21 I SRAM unit22 I Search key
SubwordI
-

00 and Subword2 II
I I
= =

0 0 2 3 I 2 3 2 PMAI - 0 and PMA2 - 2

0 I 0 0 0 I 0 0 I 3 ePE selects address 0 as MA
I 0 I 0 I I 0 I I
2 0 0 0 0 0 I 0 0
TABLE VI. MISMATCH CASE WHEN THE RESULT OF K-BIT AND
3 0 0 I 0 0 0 0 0
OPERATION IS 0 IN UE-TCAM

Steps I Activity

Thus, in this way, all the memory units are mapped. A subword SubwordI - 0I, Subword2 - II
10
in a partition may be present at multiple locations. So, its I Read out data from SRAM unitll
Read out data from SRAM Unitl2
-

= 0I
original addresses are mapped to the corresponding bits in 2 K-bit AND operation result - 00
their respective memory units. Mapped bits are high, while Since the result of K-bit AND operation is 0,
3
mismatch has occurred in layer 1.
remaining bits are set to low.

B. Data searching operation We select N = 2, L = 2, K = 2, and w = 2. After necessary

1) Data searching operation in a layer: Algorithm 1 de processing, HPII, HPI2 , HP21, and HP22 are mapped to their
scribes lookup operation in a layer of of the proposed UE corresponding SRAM units. Mapped memory units are shown
TCAM. The N subwords act as addresses and read out their in Table III. The mapped bits are high, while remaining bits
memory locations from their respective SRAM units, which are are low. For example, subword 00 is available on address 0 in
then bit-wise ANDed. LPE selects PMA; otherwise, mismatch conventional TCAM table. The subword 00 has been mapped
occurs in the layer. in SRAM unitll where its corresponding bit has been set to
high at address O.
2) Overall data searching operation: Overall search oper
ation follows Algorithm 2. A search key is applied to UE B. Data searching example
TCAM, which is then divided into N subwords to be searched
in their corresponding SRAM units in all layers in parallel. 1) Match case: We use memory units given in Table III to
Algorithm 2 uses Algorithm 1 at step 3. CPE selects MA be searched. Table IV provides an example of search operation
among PMAs; otherwise, mismatch occurs. in layers 1 and 2, where lookup operation in each layer follows
Algorithm 1. Table V provides overall search operation in UE
Algorithm 1 Search in a layer of UE-TCAM TCAM, which follows Algorithm 2. We provide input word
0011 for searching. UE-TCAM finds a match for the input
Input: N subwords where each subword is of w bits
word in layer 1 at location 0 and in layer 2 at location 2.
Output: PMA
Thus, we have PMAI 0 from layer 1 and PMA2 = 2 from =

1: Read all SRAM units concurrently layer 2. CPE selects PMAI 0 as MA, considering that it has =

2: ANDK = K[ & K2 & K3 . . . & KN the highest priority.

3: PMAImismatch occurs
2) Mismatch case: During a search operation in a layer,
mismatch of the input word can occur when none of the bits
Algorithm 2 Overall search in UE-TCAM is high after K-bit AND operation. Table VI shows a mismatch
case in layer 1.
Input: Search key of C bits
Output: MA
V II. IMPLEMEN TATION RESULTS AND PERFORMANCE
1: Divide search key into N subwords; each of w bits
EVALUATION
2: All layers use Algorithm 1 in parallel
3: MAimismatch occurs A. Implementation results
A sample design of 512 x 36 of the proposed UE-TCAM
and the available SRAM-based TCAMs-HP-TCAM [15], z
V I. UE- TCAM EXAMPLE TCAM [16], and E-TCAM [17] with L=4 and N=4 was
implemented on Xilinx Virtex-6 FPGA. Table VII shows im
A. Data mapping example
plementation results of all the SRAM-based TCAMs. Dynamic
We use Table II to be mapped to the proposed UE-TCAM. power consumption for a lookup operation was measured with
Table II also shows its hybrid partitions. We take a simple 1. 0 V core voltage and 100 MHz clock speed. We measured
example of 4 x 4 conventional TCAM table and divide it into power consumption using Xilinx Xpower analyzer [24]. We
four hybrid partitions; each one has a size of 2-wordsx2-bit. generated switching activity interchange format (SAIF) file,
TABLE VII. IMPLEMEN TATION RESULTS ON XILINX V IRTEX-6 FPGA

Results HP-TCAM [15] Z-TCAM [16] E-TCAM [17] UE-TCAM

SRs 2057 665 521 521

LUTs 5326 1982 1677 1583
Speed (MHz) 118.1 158.88 163.99 201.78
B-RAMs (18K, 36K) 16, 48 16, 32 16, 32 0, 32
Energy (fJ/bitisearch) 102.17 58.91 49.54 42.32
EDP (ns.fJ/bitisearch) 865.07 370.77 302.09 209.73
Latency (Clock cycles) 5 4 3 2

SOOO 800

4000 :2
u
ro 600
Q)
(/)
.::- 3000 � HP-TCAM :;:, I� EDP I
:0
�
'"
'"
Ez:::zJ Z-TCAM � 400
o 2000 IIIIIII E-TCAM <Ii
.s
mIIIIIIl UE-TCAM a.

1000 fil 200

SRs LUTs BRAMs (36K) BRAMs (18K) HP-TCAM Z-TCAM E-TCAM UE-TCAM
Resources on FGPA SRAM-based TCAMs

Fig. 4. Resource utilization comparison on Xilinx Virtex-6 FGPA. Fig. 5. EDP comparison of the SRAM-based TCAMs.

TABLE Vlll. PERFORMANCE EVALUATION OF UE-TCAM

Reduction (%) in resources, energy, EDP, and latency

Parameters and improvement in speed over the SRAM-based TCAMs [16]. However, UE-TCAM shows little but concrete improve
HP-TCAM [15] Z-TCAM [16] E-TCAM [17] ment over E-TCAM [17]. The proposed work achieves 100%
B-RAMs (18K, 36K) 100, 33.33 100, 0.00 100, 0.00 reduction in 18K B-RAMs over all the available SRAM-based
SRs 74.67 21.65 0.00
LUTs 70.28 20.13 0.44
TCAMs, along with a sound reduction in logical resources.
Energy 58.58 28.16 14.58 Owing to reducing the utilization of FPGA resources, the
EDP 75.76 43.43 30.57 proposed UE-TCAM gets significant reduction in energy con
Latency 60 50 33.33
Speed 70.85 27.00 23.04
sumption and latency over the existing SRAM-based TCAMs.
More importantly, the speed (throughput) improvement is also
very interesting.
which is required for more accurate power estimation. We
calculated energy/bit/search using Equation 1, which is an Bit position table (BPT) and address posltlon table ad
important metric for TCAM. Latency of all the TCAMs include dress generator (APTAG) in HP-TCAM [15] are collectively
priority encoder (PE). responsible for validating the input subword and the generation
power of an address for the corresponding address position table
Energy/bit/search = (1) (APT). In Z-TCAM [16], validation memory (VM) validates
frequency X total bits
the input subword, if present, and the address for original
address table (OAT) is then invoked from original address table
B. Peiformance evaluation
address memory (OATAM). Thus, the collective functionality
Comparison of the resource utilization, speed, energy con of VM and OATAM is equivalent to the collective functionality
sumption, energy-delay product (EDP), and latency of the pro of BPT and APTAG. BPT and APTAG in HP-TCAM, and
posed UE-TCAM with the available SRAM-based TCAMs is VM and OATAM in Z-TCAM consume a lot of resources on
given in Table VIII, which shows that the proposed UE-TCAM FPGA; thus, result in higher resource utilization, higher power
is efficient in all parameters, considered for measurement of consumption, and higher latency.
TCAM performance.
Here a question arises. Can a subword be used as a direct
The reduction in energy consumption is due to the re
address to a memory unit? Yes, it can be. In E-TCAM [17] a
duction in resource utilization. Furthermore, the proposed
subword is uesd as an address to VM and OAT. If the subword
TCAM provides higher throughput. Latency of UE-TCAM is
is validated, then it is used as a direct address to its memory
one clock cycle without PE. To the best of our knowledge,
unit. VM is used as a validation memory to validate a subword,
UE-TCAM is the first ever SRAM-based TCAM, which has
if present. The validated subword is then used as an address
achieved one clock cycle lookup operation.
to OAT to retrieve a row. VM is constructed from SRAM and
From Table VIII, we can analyze that UE-TCAM shows uses FGPA resources. Another question arises here. Can we
substantial improvement over HP-TCAM [15] and Z-TCAM remove VM? Yes, we can.
Since a subword is used as an address to a memory block, [3] S. Dharmapurikar, P. Krishnamurthy, and D. Taylor, "Longest prefix
there is no need to validate a subword and to generate an matching using bloom filters, " Networking, IEEEIACM Transactions on,
vol. 14, no. 2, pp. 397-409, 2006.
address in UE-TCAM. UE-TCAM removes the resources used
[4] P. Mahoney, Y. Savaria, G. Bois, and P. Plante, "Parallel hashing
by BPT and APTAG in HP-TCAM, by VM and OATAM in
memories: an alternative to content addressable memories, " in IEEE
Z-TCAM, and by VM in E-TCAM; thus, brings a significant NEWCAS Conference, 2005. The 3rd International, 2005, pp. 223-226.
reduction in resource utilization, power consumption, EDP, and [5] S. Y. Kartalopoulos, "RAM-based associative content-addressable mem
latency and improves speed. ory device, method of operation thereof and ATM communication
switching system employing the same, " Patent 6 097 724, August, 2000.
Fig. 4 also depicts that UE-TCAM has smaller values for
[6] w. Jiang and Y. Prasanna, "Parallel IP lookup using multiple SRAM-
BRAMs, SRs, and LUTs. Exploiting subword as an address based pipelines;' in Parallel and Distributed Processing, 2008. IPDPS
and removing BPT and APTAG bring an enormous reduction 2008. IEEE International Symposium on, 2008, pp. 1-14.
in SRs and LUTs. Since APTAG contains counter and adder [7] M. Jaiswal and R. Cheung, "VLSI implementation of double-precision
for generating an index (address), removing the APTAG also floating-point multiplier using karatsuba technique, " Circuits, Systems,
brings improvement in speed in UE-TCAM. Since BPT is an and Signal Processing, vol. 32, no. 1, pp. 15-27, 2013.
SRAM block and there are N BPTs, removing N BPTs gets [8] Z. Guo, w. Najjar, F. Vahid, and K. Vissers, "A quantitative analysis
a large reduction in SRAM blocks, which we have achieved of the speedup factors of FPGAs over processors, " in Proceedings
of the 2004 ACMISIGDA 12th International Symposium on Field
in our proposed UE-TCAM. While comparing with Z-TCAM, Programmable Gate Arrays, ser. FPGA '04, 2004, pp. 162-170.
removing VM and OATAM also brings efficiency in memory
[9] Y. Aggarwal, A. D. George, and K. C. Slatton, "Reconfigurable com
units utilization. Similarly, while comparing with E-TCAM, puting with multiscale data fusion for remote sensing, " in Proceedings
removing VM also gets efficiency in memory units utilization. of the 2006 ACMISIGDA 14th International Symposium on Field
Programmable Gate Arrays, ser. FPGA '06, 2006, pp. 235-235.
Fig. 5 provides a comparison of EDP among the SRAM
[10] M. Becchi and P. Crowley, "Efficient regular expression evaluation:
based TCAMs. The comparison clearly demonstrates the per Theory to practice, " in Proceedings of the 4th ACMIIEEE Symposium on
formance improvement of the proposed UE-TCAM over the Architectures for Networking and Communications Systems, ser. ANCS
existing SRAM-based TCAMs. The architecture of UE-TCAM '08, 2008, pp. 50-59.
is much simpler and provides higher operating speed, while [II] w. Jiang and Y. Prasanna, "Scalable packet classification on FPGA, "
exploiting minimum FPGA resources when compared with E Very Large Scale Integration (VLSI) Systems, IEEE Transactions on,
vol. 20, no. 9, pp. 1668-1680, 2012.
TCAM, Z-TCAM, and HP-TCAM.
[12] C. H. Kim, S. Kwon, and C. P. Hong, "FPGA implementation of high
performance elliptic curve cryptographic processor over gf(2163), " 1.
V III. CONCLUSIONS AND FUTURE WORK Syst. Archit., vol. 54, no. 10, pp. 893-900, Oct. 2008.

This paper presented an efficient SRAM-Based TCAM [13] R. C. C. Cheung, N. Telle, W. Luk, and P. Y. K. Cheung, "Customizable
elliptic curve cryptosystems, " Very Large Scale Integration (VLSl)
architecture, UE-TCAM. We implemented a sample design of
Systems, IEEE Transactions on, vol. 13, no. 9, pp. 1048-1059, Sept
512 x 36 of it on Xilinx Virtex-6 FPGA. UE-TCAM consumes 2005.
less memory resources and less logic elements on FPGA; thus, [14] Xilinx, "Xilinx FPGAs, " http://www.xilinx.com.
creating a much simpler TCAM structure. By comparing with [15] Z. UIlah, K. ligon, and S. Baeg, "Hybrid partitioned SRAM-based
the available SRAM-based TCAMs, UE-TCAM shows signif ternary content addressable memory, " Circuits and Systems 1: Regular
icant reduction is size, power consumption, and latency and Papers, IEEE Transactions on, vol. 59, no. 12, pp. 2969-2979, 2012.
provides higher operating speed. For example, when compared [16] Z. Ullah, M. Jaiswal, and R. Cheung, "Z-TCAM: An SRAM-based
with HP-TCAM [15], UE-TCAM brings 100% reduction in architecture for TCAM, " Very Large Scale Integration (VLSl) Systems,
IEEE Transactions on, vol. 23, no. 2, pp. 402-406, Feb 2015.
18K B-RAMs, 74.67% reduction in SRs, 70. 28% reduction in
[17] --, "E-TCAM: An efficient SRAM-based architecture for TCAM, "
LUTs, 75.76% reduction in EDP, and 60% reduction in latency
Circuits, Systems, and Signal Processing, vol. 33, no. 10, pp. 3123-
and improves speed by 70.85%. 3144, 2014.
We understand that SRAM-based TCAM design is a rich [18] S. Cho, J. Martin, R. Xu, M. Hammoud, and R. Melhem, "CA-RAM: A
high-performance memory substrate for search-intensive applications, "
field for research and further investigation is necessary to
in Performance Analysis of Systems Software, 2007. ISPASS 2007. IEEE
find out more SRAM-based TCAM approaches. We hope that International Symposium on, 2007, pp. 230-241.
the area will be further enriched by researcher in industry [19] W. Jiang, Y. K. Prasanna, and N. Yamagaki, "Decision forest: A scalable
and academia. Our future work includes configuring the UE architecture for flexible flow matching on FPGA, " in Proceedings of
TCAM for precomparison access mode to further get power the 2010 International Conference on Field Programmable Logic and
efficiency, and using the proposed TCAM in some applications. Applications, ser. FPL '10, 2010, pp. 394-399.
[20] W. Jiang and Y. K. Prasanna, "Large-scale wire-speed packet classi
fication on FPGAs, " in Proceedings of the ACMISIGDA international
ACKNOW LEDGMENT
symposium on Field programmable gate arrays, ser. FPGA '09, 2009,
This work was partly supported by the Croucher Startup pp. 219-228.
Grant (Grant No. 9500015). [21] M. Somasundaram, "Circuits to generate a sequential index for an input
number in a pre-defined list of numbers, " Patent 7 155 563, December,
2006.
REFERENCES
[22] --, "Memory and power efficient mechanism for fast table lookup, "
[I] M. Peng and S. Azgomi, "Content-addressable memory (CAM) and its Patent 20 060 253 648, November, 2006.
network applications, " in International IC-Taipei proceedings, Altera [23] K. Pagiamtzis and A. Sheikholeslami, "Content-addressable memory
International Ltd. (CAM) circuits and architectures: a tutorial and survey, " Solid-State
[2] N. Mohan, w. Fung, D. Wright, and M. Sachdev, "Design techniques Circuits, IEEE Journal of, vol. 41, no. 3, pp. 712-727, 2006.
and test methodology for low-power TCAMs, " Very Large Scale In [24] Xilinx, "Xilinx Xpower Analyzer, " http://www.xilinx.com.
tegration (VLSl) Systems, IEEE Transactions on, vol. 14, no. 6, pp.
573-586, 2006.

Design of TCAM Architecture For Low Power and High Performance Applications (#385994) - 660370
No ratings yet
Design of TCAM Architecture For Low Power and High Performance Applications (#385994) - 660370
10 pages
Performance Improvement in SRAM EmulatedTCAM
No ratings yet
Performance Improvement in SRAM EmulatedTCAM
7 pages
10 1109@tvlsi 2019 2904105
No ratings yet
10 1109@tvlsi 2019 2904105
10 pages
Algorithmic TCAM On FPGA With Data Collision Appro
No ratings yet
Algorithmic TCAM On FPGA With Data Collision Appro
8 pages
0.efficient TCAM Design Based On Dual Port
No ratings yet
0.efficient TCAM Design Based On Dual Port
9 pages
Irfan 2017
No ratings yet
Irfan 2017
6 pages
CAM Using 6T-SRAM
No ratings yet
CAM Using 6T-SRAM
12 pages
3.5.Về tiết kiệm điện năng
No ratings yet
3.5.Về tiết kiệm điện năng
4 pages
The Analogy of Matchline Sensing Techniques For Content Addressable Memory (CAM)
No ratings yet
The Analogy of Matchline Sensing Techniques For Content Addressable Memory (CAM)
10 pages
1.2A High-Performance Distributed RAM Based TCAM Arch
No ratings yet
1.2A High-Performance Distributed RAM Based TCAM Arch
10 pages
Design and Implementation of 64-Bit SRAM and CAM On Cadence and Open-Source Environment
No ratings yet
Design and Implementation of 64-Bit SRAM and CAM On Cadence and Open-Source Environment
9 pages
TCAM (Ternary Content Address Able Memory)
100% (1)
TCAM (Ternary Content Address Able Memory)
10 pages
Seminar
No ratings yet
Seminar
29 pages
3.3.Về tăng throughput
No ratings yet
3.3.Về tăng throughput
10 pages
Low Power CAM Cell Design Analysis
No ratings yet
Low Power CAM Cell Design Analysis
5 pages
Cams 2020
No ratings yet
Cams 2020
2 pages
Content Addressable Memory Using XNOR CAM Cell
No ratings yet
Content Addressable Memory Using XNOR CAM Cell
5 pages
Design Techniques and Test Methodology For Low-Power Tcams
No ratings yet
Design Techniques and Test Methodology For Low-Power Tcams
14 pages
Design and Implementation of Content Addressable Memory (CAM) Architecture
No ratings yet
Design and Implementation of Content Addressable Memory (CAM) Architecture
4 pages
Zerbini 2012
No ratings yet
Zerbini 2012
6 pages
Electronics 12 03691 v2
No ratings yet
Electronics 12 03691 v2
13 pages
WINSEM2017-18 - ECE5023 - TH - TT531A - VL2017185001741 - Reference Material I - Cam-Svv
No ratings yet
WINSEM2017-18 - ECE5023 - TH - TT531A - VL2017185001741 - Reference Material I - Cam-Svv
20 pages
Efficient TCAM Design Based On Multipumping-Enable
No ratings yet
Efficient TCAM Design Based On Multipumping-Enable
9 pages
3.4.Về giảm thời gian write data
No ratings yet
3.4.Về giảm thời gian write data
4 pages
TCAM Paper
No ratings yet
TCAM Paper
7 pages
Multi-Port RAMs & CAMs in CPUs
No ratings yet
Multi-Port RAMs & CAMs in CPUs
5 pages
Analysis and Design of Low Power Content Addressable Memory (CAM) Cell
100% (1)
Analysis and Design of Low Power Content Addressable Memory (CAM) Cell
6 pages
Check Error
No ratings yet
Check Error
5 pages
Ripple-Precharge TCAM (Network Search Engines - Low Power Solution)
No ratings yet
Ripple-Precharge TCAM (Network Search Engines - Low Power Solution)
7 pages
International Journal of Engineering Research and Development
No ratings yet
International Journal of Engineering Research and Development
6 pages
Ullah 2020
No ratings yet
Ullah 2020
4 pages
Draft - Pham Quang Anh - Design Low Area and Enhanced Capacity 256x128 Bit TCAM For IPv6 Address - Ver7
No ratings yet
Draft - Pham Quang Anh - Design Low Area and Enhanced Capacity 256x128 Bit TCAM For IPv6 Address - Ver7
18 pages
Reviriego 2019
No ratings yet
Reviriego 2019
5 pages
CAM Basepaper
No ratings yet
CAM Basepaper
12 pages
CAM Verification with Layered Test Bench
No ratings yet
CAM Verification with Layered Test Bench
12 pages
C4CAM: A Compiler For CAM-based In-Memory Accelerators: Hamid Farzaneh João Paulo C. de Lima Mengyuan Li
No ratings yet
C4CAM: A Compiler For CAM-based In-Memory Accelerators: Hamid Farzaneh João Paulo C. de Lima Mengyuan Li
10 pages
Deld
No ratings yet
Deld
7 pages
1 s2.0 S2090447923003672 Main
No ratings yet
1 s2.0 S2090447923003672 Main
11 pages
Double Error Recognition
No ratings yet
Double Error Recognition
8 pages
Configurable BCAM-TCAM Based On 6T SRAM
No ratings yet
Configurable BCAM-TCAM Based On 6T SRAM
4 pages
Aasritha
No ratings yet
Aasritha
18 pages
Content Addresable Memory (Cam) Introduction
No ratings yet
Content Addresable Memory (Cam) Introduction
16 pages
Configurable CAM with 6T SRAM
No ratings yet
Configurable CAM with 6T SRAM
2 pages
Fast FPGA CAM Mapping & Update
No ratings yet
Fast FPGA CAM Mapping & Update
9 pages
Low-Area TCAM with DCR Scheme
No ratings yet
Low-Area TCAM with DCR Scheme
7 pages
Thesis Help for CAM Students
100% (2)
Thesis Help for CAM Students
8 pages
Power-Saving Hybrid Cams For Parallel Ip Lookups: Heeyeol Yu and Rabi Mahapatra Uichin Lee
No ratings yet
Power-Saving Hybrid Cams For Parallel Ip Lookups: Heeyeol Yu and Rabi Mahapatra Uichin Lee
2 pages
Precharge-Free, Low-Power Content-Addressable Memory
No ratings yet
Precharge-Free, Low-Power Content-Addressable Memory
8 pages
Modular SRAM-based Binary Content-Addressable Memories: Ameer M.S. Abdelhadi and Guy G.F. Lemieux
No ratings yet
Modular SRAM-based Binary Content-Addressable Memories: Ameer M.S. Abdelhadi and Guy G.F. Lemieux
8 pages
10 1109@tvlsi 2020 2993168
No ratings yet
10 1109@tvlsi 2020 2993168
5 pages
4.4.Dùng Lý Thuyết Tập Hợp Để Encode
No ratings yet
4.4.Dùng Lý Thuyết Tập Hợp Để Encode
8 pages
Er-Tcam: A Soft-Error-Resilient Sram-Based Ternary Content-Addressable Memory For Fpgas
No ratings yet
Er-Tcam: A Soft-Error-Resilient Sram-Based Ternary Content-Addressable Memory For Fpgas
24 pages
Soft Error Tolerant CAM
No ratings yet
Soft Error Tolerant CAM
4 pages
Fpt2014 Paper
No ratings yet
Fpt2014 Paper
4 pages
A Low Power Hybrid Partition SRAM Based TCAM With A Parity Bit
No ratings yet
A Low Power Hybrid Partition SRAM Based TCAM With A Parity Bit
5 pages
An Overview of Content Addressable Memory: 2.1 Conventional CAM Architecture
No ratings yet
An Overview of Content Addressable Memory: 2.1 Conventional CAM Architecture
31 pages
Ref 1
No ratings yet
Ref 1
1 page
Ug1414 Vitis Ai
No ratings yet
Ug1414 Vitis Ai
214 pages
PID6229471
No ratings yet
PID6229471
5 pages
Mosys Uc Stellar 5g-Routing 210617
No ratings yet
Mosys Uc Stellar 5g-Routing 210617
3 pages
Z Yeim-Kuanchang2006
No ratings yet
Z Yeim-Kuanchang2006
16 pages
1.coretcam HB
No ratings yet
1.coretcam HB
29 pages
Field-Effect Transistors: Microelectronic Circuit Design, Richard C. Jaeger, Travis N. Blalock
No ratings yet
Field-Effect Transistors: Microelectronic Circuit Design, Richard C. Jaeger, Travis N. Blalock
38 pages
Understanding Microcontroller Clocks
No ratings yet
Understanding Microcontroller Clocks
9 pages
Installation & Operation Manual: Model SMB9675-1A-1-7700
No ratings yet
Installation & Operation Manual: Model SMB9675-1A-1-7700
33 pages
B.Tech Electrical Engg Model Paper
No ratings yet
B.Tech Electrical Engg Model Paper
2 pages
Ups - Aw Dee 435580 R14a
No ratings yet
Ups - Aw Dee 435580 R14a
11 pages
VSD-2023-V1 - Inversor de Frequência
No ratings yet
VSD-2023-V1 - Inversor de Frequência
13 pages
Isolators and Contactors - 1
No ratings yet
Isolators and Contactors - 1
43 pages
X-FAB Annual Report 2023 ENG
No ratings yet
X-FAB Annual Report 2023 ENG
138 pages
Feed Water Specifications
No ratings yet
Feed Water Specifications
4 pages
Order Template HL & 3C 2025
No ratings yet
Order Template HL & 3C 2025
368 pages
PWM Controller For ZVS Half Bridge: Eatures
No ratings yet
PWM Controller For ZVS Half Bridge: Eatures
41 pages
Fire Protection for Electrical Rooms
No ratings yet
Fire Protection for Electrical Rooms
8 pages
Electrical Measurement Lab Manual
100% (1)
Electrical Measurement Lab Manual
57 pages
B.E Biomedical Mechatronics Exam
No ratings yet
B.E Biomedical Mechatronics Exam
1 page
REO Energex BLDC Fan Detail
No ratings yet
REO Energex BLDC Fan Detail
6 pages
VTU Exam Question Paper With Solution of BCS302 Digital Design and Computer Organization April-2024-Dr - Ciyamala Kushbu S
No ratings yet
VTU Exam Question Paper With Solution of BCS302 Digital Design and Computer Organization April-2024-Dr - Ciyamala Kushbu S
4 pages
Utc 421P
No ratings yet
Utc 421P
4 pages
Ideal Boiler Error Codes Guide
No ratings yet
Ideal Boiler Error Codes Guide
3 pages
IOT Based Industrial Plant Safety Gas Leakage Detection
No ratings yet
IOT Based Industrial Plant Safety Gas Leakage Detection
6 pages
Acu Rite 00782 Manual
No ratings yet
Acu Rite 00782 Manual
2 pages
CBSE UGC NET Electronic Science Paper 2 June 2007
No ratings yet
CBSE UGC NET Electronic Science Paper 2 June 2007
14 pages
Expt 1
No ratings yet
Expt 1
9 pages
Lexium 17D Motion Control
No ratings yet
Lexium 17D Motion Control
10 pages
Grade 6 Science Electricity
No ratings yet
Grade 6 Science Electricity
8 pages
DS100S Manual
No ratings yet
DS100S Manual
23 pages
Power PCB Relay 409: 1 Pole 10 A, 2 Pole 8 A
No ratings yet
Power PCB Relay 409: 1 Pole 10 A, 2 Pole 8 A
2 pages
The XR2206 Function Generator DIY Kit & Other Goodies
No ratings yet
The XR2206 Function Generator DIY Kit & Other Goodies
8 pages
4-Frequency Response of RLC Series Resonance Circuit
No ratings yet
4-Frequency Response of RLC Series Resonance Circuit
3 pages
Spark Rod Info
No ratings yet
Spark Rod Info
3 pages
Magnetic Adapter for Apple Devices
No ratings yet
Magnetic Adapter for Apple Devices
6 pages
Basics of Thin Film Technology
100% (1)
Basics of Thin Film Technology
28 pages

Ullah 2015

Uploaded by

Ullah 2015

Uploaded by

UE-TCAM: An Ultra Efficient SRAM-based TCAM

.r;! w-bit w-bit w-bit w-bit w-bit w-bit

III. HYB RID PARTITIONING

We use hybrid partitioning (HP), shown Fig. 1, to di­

power, and cost; however, it nicely generates layers. Hybrid

IV. ARCHI T ECTURE OF UE-TCAM

Address Hybrid partitions Layer I Steps I Activity Layer I Layer 2

Original addresses Steps I Activity

0 0 2 3 I 2 3 2 PMAI - 0 and PMA2 - 2

B. Data searching operation We select N = 2, L = 2, K = 2, and w = 2. After necessary

2: ANDK = K[ & K2 & K3 . . . & KN the highest priority.

Results HP-TCAM [15] Z-TCAM [16] E-TCAM [17] UE-TCAM

SRs 2057 665 521 521

1000 fil 200

TABLE Vlll. PERFORMANCE EVALUATION OF UE-TCAM

Reduction (%) in resources, energy, EDP, and latency

You might also like

We use hybrid partitioning (HP), shown Fig. 1, to di