Jugglepac: A Pipelined Accumulation Circuit: Ahmad Houraniah, H. Fatih Ugurdag, Furkan Aydin

JugglePAC is a novel fully pipelined accumulation circuit designed to efficiently handle high-speed, back-to-back variable-length datasets in floating-point accumulation. It features a dynamic scheduling mechanism that optimizes area and timing, achieving higher throughput and reduced area complexity compared to state-of-the-art solutions. The implementation of JugglePAC on various FPGAs demonstrates significant improvements in performance, operating at a frequency of 208 MHz and achieving low latency while maintaining ordered output results.

Uploaded by

s755369.eed03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views4 pages

Jugglepac: A Pipelined Accumulation Circuit: Ahmad Houraniah, H. Fatih Ugurdag, Furkan Aydin

Uploaded by

s755369.eed03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

1

JugglePAC: a Pipelined Accumulation Circuit

Ahmad Houraniah , H. Fatih Ugurdag (Senior Member, IEEE) , Furkan Aydin (Member, IEEE)

Abstract—Reducing a set of numbers to a single value is TABLE I

a fundamental operation in applications such as signal pro- Accumulation schedule for SimplePAC versus JugglePAC.
cessing, data compression, scientific computing, and neural
networks. Accumulation, which involves summing a dataset SimplePAC JugglePAC
to obtain a single result, is crucial for these tasks. Due to Adder Adder
Input Input
hardware constraints, large vectors or matrices often cannot in1 in2 out in1 in2 out
arXiv:2310.01336v2 [cs.AR] 16 Sep 2024

be fully stored in memory and must be read sequentially, a0 a0 0 a0

one item per clock cycle. For high-speed inputs, such as a1 a1 0 a1 a0 a1
rapidly arriving floating-point numbers, pipelined adders a2 a2 0 a2
are necessary to maintain performance. However, pipelining a3 a0 a3 a0 a3 a2 a3
introduces multiple intermediate sums and requires delays a4 a1 a4 a1 a4 a0,1
between back-to-back datasets unless their processing is a5 a2 a5 a2 a5 a4 a5
overlapped. In this paper, we present JugglePAC, a novel a0,3 b0 a0,1 a2,3 a2,3
accumulation circuit designed to address these challenges. a0,3 a1,4 a1,4 b1 b0 b1
JugglePAC operates quickly, is area-efficient, and features Stall a2,5 b2 a0,1 a2,3 a4,5
a fully pipelined design. It effectively manages back-to- b3 b2 b3 a0:3
back variable-length datasets while consistently producing a0,1,3,4 a2,5 a0,1,3,4 b4 a4,5 a0:3 b0,1
results in the correct input order. Compared to the state-of- b0 b0 0 b5 b4 b5
the-art, JugglePAC achieves higher throughput and reduces b1 b1 0 b6 a0:3 a4,5 b2,3
area complexity, offering significant improvements in perfor- b2 b2 0 a0:5 b7 b6 b7 a0:5
mance and efficiency.
Index Terms—Fully pipelined reduction circuits, floating- Table I depicts an example of a floating-point adder-
point number accumulation, field-programmable gate arrays, based accumulator with a pipeline latency of 3 clock cy-
computer arithmetic. cles, while new input values are fed in every cycle. Using
I. Introduction a simple accumulation schedule (SimplePAC), we end up
In the realm of modern computing, the ability to with 3 subsums, 3 cycles after all data inputs are fed to
efficiently reduce a set of numbers to a single value is the adder. The subsums are shown as a0,3 , a1,4 , and a2,5 in
fundamental to a wide variety of computational appli- bold in Table I. The simplest approach to accumulation
cations. This process, known as reduction, is crucial for would be not to allow a new dataset (called b0:5 ) until
tasks ranging from signal processing [1], [2], data com- the last addition of the current dataset is fed into the
pression [3], [4], scientific computing [5], [6], and neural adder pipeline (i.e., introduce stalls between consecutive
networks [7], [8]. Among the various types of reduction datasets). However, SimplePAC does not accept back-
operations, accumulation, which involves summing a to-back datasets. Using a more intuitive schedule, we
dataset to produce a single result, is one of the most can alternate between additions from different sets to
essential. As data complexity and scale increase, high- maintain a fully pipelined accumulation circuit. Such
performance accumulation methods that handle large a schedule is shown in Table I using the JugglePAC
datasets efficiently become more essential. schedule, which starts accumulating the data from the
Accumulation operations can be applied to both consecutive dataset without requiring any stalls.
floating-point and integer data. For integer data, the This work presents JugglePAC, a novel fully pipelined
use of a 3:2 compressor simplifies the process by re- accumulation circuit that optimizes both area and timing
ducing latency, making integer accumulation straight- using a single floating-point adder. We implement Jug-
forward. However, floating-point accumulation presents glePAC and evaluate its performance across multiple FP-
more complex challenges, necessitating pipelined adders GAs, benchmarking it against state-of-the-art solutions.
to manage high data input rates. Pipelining, while essen- The major contributions of this work are:
tial for maintaining throughput in high-speed applica- 1) We propose a fully pipelined accumulation circuit,
tions, introduces design complexities such as managing namely JugglePAC, which efficiently handles high-
multiple intermediate sums and potential delays with speed, back-to-back variable-length datasets.
consecutive datasets. 2) JugglePAC demonstrates improvements over exist-
A. Houraniah is with the Dept. of Computer Science, Ozyegin Uni- ing solutions by achieving higher frequencies and
versity, Istanbul 34794, Turkey (email: [email protected]). reducing area complexity.
H. F. Ugurdag is with the Dept. of Electrical and Electronics Engi- 3) Our work introduces a dynamic scheduling mech-
neering, Ozyegin University, Istanbul 34794, Turkey.
F. Aydin is with the Dept. of Electrical and Computer Engineering, anism for variable-length datasets, addressing the
North Carolina State University, Raleigh, NC 27606, USA. challenges associated with inefficient control logic
2

in existing designs. data to achieve efficient accumulation. Given that data

4) We implement JugglePAC on two different tar- typically arrives serially, the adder operates with a
get FPGAs, specifically the Xilinx XC2VP30 and throughput of 1, performing additions every 2 cycles.
XC5VLX110T, showing consistent improvements in This setup ensures that the adder is utilized 50% of the
both area and timing over the state-of-the-art. time, while producing results at least every 2 cycles.
Consequently, a single adder is sufficient to maintain
II. Related Work pace with the input rate when additions are scheduled
effectively. JugglePAC employs a state machine with two
Recent work on floating-point accumulation circuits distinct states to manage the addition process. In the first
has focused on optimizing area, performance, and com- state, the inputs are directly added. In the second state,
plexity. Early designs, such as Luo and Martonosi [9], the design handles the addition of any available pair
used carry-save arithmetic and delayed adders for high- of subsums, as illustrated in Fig. 1. The state machine
performance accumulation but were not fully pipelined, iterates between these two states every cycle, with an
leading to performance bottlenecks due to the required exception for datasets of odd length, where the state
stalls. machine remains in state 1 for an additional cycle.
Vangal [10] improved on this by introducing a To enhance performance for high-throughput ap-
pipelined structure for single-precision floating-point plications, JugglePAC is designed to handle back-to-
multiply-accumulate operations, yet managing the back inputs, thereby avoiding data pile-ups. This ap-
pipeline complexity remained a challenge. He et al. proach introduces additional complexity as it neces-
[11] proposed a group alignment algorithm to improve sitates the simultaneous processing of subsums from
accuracy but struggled with scalability and efficiency for different datasets. To manage this, each subsum is as-
variable-length datasets. signed a unique label, an integer that increments with
Nagar and Bakos [12] developed a double-precision each new dataset (color-coded with green in Fig. 1).
accumulator with a coalescing reduction circuit, reduc- This labeling system is maintained using a shift register
ing complexity but limited by its FPGA-specific design. with a latency equal to that of the adder as depicted
Zhou et al. [13] introduced several designs, including the in Fig. 1 and color-coded with purple. The Matching
Fully Compacted Binary Tree (FCBT) and Dual Striped Shift Register block matches subsums from the same
Adder (DSA), which managed multiple input sets but dataset and distinguishes between those from different
faced issues with buffer management and clock speed. datasets. An additional pipeline stage is added before the
Huang and Andrews [14] proposed modular, fully inputs of the adder to increase the throughput, which is
pipelined architectures capable of handling arbitrary represented by dashes in the figure.
dataset sizes, though their designs faced challenges with Scheduling additions of subsums requires efficient
underutilized pipelined adders and extensive buffering. control logic. The Pair Identifier (PI) block, shown in
Recent designs [15]–[19] aimed at balancing area and Fig. 1 and color-coded with yellow, is responsible for
timing performance. [19] notably improved the area- managing this process. The PI receives the results from
timing product but required multiple BRAMs, impacting the adder along with their labels and schedules the addi-
area efficiency. tions accordingly. It uses a register for each label, storing
In contrast, JugglePAC offers a novel approach with a incoming subsums and identifying pairs for addition.
fully pipelined architecture that simplifies control logic When a pair is identified, the PI schedules the addition
and handles variable-length datasets efficiently. Its dy- and clears the corresponding register. This control logic
namic scheduling mechanism ensures high frequency allows JugglePAC to juggle between additions from dif-
and low area complexity, outperforming previous de- ferent datasets while minimizing area requirements. The
signs in scalability and adaptability across different number of registers in the PI depends on the label size,
FPGA architectures. with L representing the maximum number of labels. To
handle situations where the adder may not always be
III. JugglePAC available for scheduled additions, JugglePAC employs a
JugglePAC is a novel floating-point accumulation cir- FIFO buffer to temporarily store ready-to-add subsum
cuit designed to optimize performance and area. This pairs along with their labels. The FIFO is read every 2
section describes the microarchitecture of JugglePAC and cycles, with a maximum depth of ⌈log(p)⌉, where p is
its inter-dataset behavior as well as key challenges such the adder’s latency. The FIFO’s design, which utilizes
as the ability to mix variable-length datasets. registers, ensures efficient data management.
The state machine within JugglePAC manages the
accumulation process by maintaining a low area and
A. JugglePAC Microarchitecture timing complexity. It alternates between adding serial
The JugglePAC architecture is built around a floating- input data and processing data from the FIFO. A sample
point adder, which plays a crucial role in the accumu- scheduling scenario using the JugglePAC approach is
lation process. The fundamental concept of JugglePAC illustrated in Table I, demonstrating how the accumu-
involves scheduling the additions of serially arriving lation of dataset b0:N begins before the results of dataset
3

a0:5 are completed. The simple control logic for the

state machine and pair identification allows JugglePAC
to maintain a low area complexity and critical path,
reset
outperforming the state-of-the-art. Label Adder out
st 0 st 1
For output identification, JugglePAC uses a counter to
track the number of additions performed. The counter
in
increments with each addition from state 1 and decre-
valid
ments with each addition from state 0. The system skips start 0 reg reg reg
incrementing on the first addition of state 1, ensuring the reset
1 0
counter returns to zero after each operation. The counter clock Pair Identifier
p+2 Label Adder Adder
reaches a maximum value of ⌈ 2 ⌉, where p is the adder’s in1 in2
latency, and returns to zero upon completing the final
++Label
addition. Separate counters are used for each simulta-
neous accumulation, with L representing the maximum ( ) -slot FIFO

number of counters. The output identifier is represented

in Fig. 1 and color-coded with red.
st

B. Inter-Dataset Behavior
The JugglePAC architecture allows for the parallel pro-
cessing of datasets, with the label size determining the Matching Floating-point
maximum number of datasets that can be handled simul- Shift Register Adder
taneously. A smaller label size reduces area complexity
but introduces a minimum dataset length requirement.
When the number of datasets exceeds L, the circuit may
mix accumulations from different datasets, leading to in- cnt
st ++/-- == 1 0
0
correct results. Therefore, the minimum dataset length is
identified through comprehensive testing with variable- Output Identifier
length datasets.
Increasing the label size enhances the circuit’s capabil- outEn out
ity to handle more datasets but also increases the logical
resources required for the PI and output identification Fig. 1. JugglePAC microarchitecture features a Floating-point Adder
modules. This tradeoff between area complexity and and key components including a state machine for managing additions,
minimum dataset length is summarized in Section IV, a Matching Shift Register for labeling subsums, and a Pair Identifier
block for scheduling. A FIFO buffer ensures efficient data handling
which shows the area complexity increasing with larger and prevents pile-ups, while an Output Identifier tracks the number of
label sizes. additions performed. This design optimizes throughput and minimizes
JugglePAC’s dynamic accumulation approach results area complexity.
in latency that depends on the current and previous
dataset lengths. For label sizes less than 3, the circuit JugglePAC demonstrates a significant reduction in
maintains consistent latency, ensuring that the results area complexity, using fewer slices than designs like
are produced in input order. However, with larger label MFPA, AeMFPA, and FAAC [14], with up to 71% less
sizes, the minimum dataset length can lead to results slice usage. Additionally, JugglePAC operates without
being output in an order that deviates from the input BRAMs, unlike FCBT and DSA [13], contributing to its
sequence. Additional control logic can be used to reorder lower area complexity.
results based on the label or to output the label itself, In terms of latency, JugglePAC performs competitively.
allowing the system to identify the dataset to which the For a label size of 2 and minimum dataset length of 22,
result belongs. Setting the minimum dataset length to 19 JugglePAC achieves a latency of approximately 1.077 µs,
ensures that JugglePAC consistently produces ordered comparable to or better than most previous designs such
results. This design achieves a balance between high as DSA and SSA [13]. Its throughput is also high, outper-
frequency and low area complexity, surpassing existing forming designs like DB [19] and BTTP [20], especially
state-of-the-art solutions. with larger datasets.
JugglePAC operates at a frequency of 208 MHz, ex-
IV. Implementation Results ceeding many previous designs like FPACC [16] and
We evaluated the JugglePAC floating-point accumu- FCBT [13]. It achieves the lowest ”Slices ×µs” score,
lation circuit against existing designs, focusing on area reflecting superior efficiency in balancing area and per-
complexity, latency, throughput, and frequency. Table formance.
II summarizes these metrics for JugglePAC and other The full pipelining of JugglePAC enhances its perfor-
designs. mance and resource utilization, making it effective in
4

TABLE II
Comparison with previously proposed accumulation circuits.

Label Min. Dataset Frequency Total Latency

Design Adders Slices BRAMs Slices×µs FPGA
Size Length (MHz) clock cycles µs
MFPA [14] - - 4 4,991 2 207 198 0.957 4,776
AeMFPA [14] - - 2 3,130 14 204 198 0.970 3,036
Ae2 MFPA [14] - - 2 3,737 2 144 198 1.370 5,120
FAAC [15] - - 3 6,252 0 199 176 1.086 6,790
FCBT [13] - - 2 2,859 10 170 ≤ 475 ≤ 2.794 7,988
DSA [13] - - 2 2,215 3 142 232 1.634 3,619 XC2VP30
SSA [13] - - 1 1,804 6 165 ≤ 520 ≤ 3.152 5,686
DB [19] - - 1 1,749 6 188 ≤ 199 ≤ 1.058 1850
JugglePAC 1 74 1 1,439 0 208 ≤ 220 ≤ 1.058 1,522
JugglePAC 2 22 1 1,796 0 208 ≤ 224 ≤ 1.077 1,934
JugglePAC 3 10 1 2,343 0 208 ≤ 224 ≤ 1.077 2,523
FPACC [16] - - - 683 - 247 - - - VC5VSX50T
BTTP [20] - - 1 648 9.5 305 - - -
XC5VLX110T
JugglePAC 2 22 1 578 0 334 ≤224 ≤ 0.671 388

high-speed operations. However, using a single floating- [4] T. Miller and K. Davis, “Modern data compression: Algorithms
point adder might limit performance in cases requiring and implementations,” IEEE Trans. on Data Compression, vol. 10,
pp. 234–245, 2020.
multiple adders. JugglePAC’s minimum dataset length [5] C. Lee, “Scientific computing and its challenges in the era of big
varies with label size, ensuring accurate accumulation data,” Computational Sci. Review, vol. 7, pp. 45–60, 2018.
but potentially limiting flexibility. Future work could [6] K. Thompson and B. Carter, “Computational methods in sci.
and engineering: Advances and applications,” Computational Sci.
explore the incorporation of multiple adders to address Review, vol. 35, 2022.
these limitations and enhance performance. [7] A. Brown, “Neural networks and their applications in data reduc-
Overall, JugglePAC represents a significant advance- tion,” Journal of Artif. Intel. Research, vol. 9, pp. 245–260, 2017.
[8] S. Davis and E. Wright, “Advances in neural networks for image
ment in floating-point accumulation circuits, with strong recognition,” in Proc. of the Conf. on Neural Information Processing
performance across key metrics. The results highlight its Systems (NeurIPS), 2021, pp. 2456–2466.
potential for various applications and provide a basis for [9] Z. Luo and M. Martonosi, “Accelerating pipelined integer and
floating-point accumulations in configurable hardware with de-
future research and optimization. layed addition techniques,” IEEE Trans. on Computers, vol. 49, pp.
208–218, 2000.
V. Conclusion [10] S. Vangal, Y. Hoskote, N. Borkar, and A. Alvandpour, “A 6.2-
GFlops floating-point multiply-accumulator with conditional nor-
Accumulation is a fundamental operation, which ap- malization,” IEEE Journal of Solid-St. Circuits, vol. 41, pp. 2314–
pears in many types of computational workloads. In 2323, 2006.
the case of high-throughput floating-point accumulation, [11] C. He, G. Qin, M. Lu, and W. Zhao, “Group-alignment based
accurate floating-point summation on FPGAs.” in Proc. Int. Conf.
complexities arise due to pipelining, especially when Eng. Reconfig. Sys. and Algorithms (ERSA), vol. 6, 2006, pp. 136–142.
the system must handle consecutive datasets of vary- [12] K. K. Nagar and J. D. Bakos, “A high-performance double preci-
ing lengths while producing results in the input order. sion accumulator,” in Int. Conf. on Field-Programmable Technology
(FPT), 2009, pp. 500–503.
Existing solutions often introduce significant overheads, [13] L. Zhuo, G. R. Morris, and V. K. Prasanna, “High-performance
while resulting in a reduced clock frequency or increased reduction circuits using deeply pipelined operators on FPGAs,”
area complexity. In this work, we have introduced Jug- IEEE Trans. on Parallel and Dist. Sys., vol. 18, pp. 1377–1392, 2007.
[14] M. Huang and D. Andrews, “Modular design of fully pipelined
glePAC, a novel fully pipelined reduction circuit de- reduction circuits on FPGAs,” IEEE Trans. on Parallel and Dist. Sys.,
signed to overcome these challenges. JugglePAC lever- vol. 24, pp. 1818–1826, 2013.
ages a single floating-point adder and yet efficiently [15] S. Sun and J. Zambreno, “A floating-point accumulator for FPGA-
based high performance computing applications,” in Proc. Int.
manages accumulation tasks. Implemented and evalu- Conf. on Field-Programmable Technology (FPT), 2009, pp. 493–499.
ated across multiple FPGAs, JugglePAC consistently de- [16] T. Ould-Bachir and J.-P. David, “Performing floating-point accu-
livers superior results in area and timing simultaneously, mulation on a modern FPGA in single and double precision,” in
Proc. IEEE Ann. Int. Symp. on Field-Programmable Custom Comput-
while previous works deliver either competitive area or ing Machines (FCCM), 2010, pp. 105–108.
timing, but not both at the same time. [17] Y. G. Tai, C. T. D. Lo, and K. Psarris, “An improved reduction
algorithm with deeply pipelined operators,” in Proc. IEEE Int.
References Conf. on Systems, Man and Cybernetics (SMC), 2009, pp. 3060–3065.
[18] ——, “Multiple data set reduction on FPGAs,” in Proc. Int. Conf.
[1] J. Smith, “Signal processing for large datasets,” IEEE Trans. on on Field-Programmable Technology (FPT), 2010, pp. 45–52.
Signal Processing, vol. 68, pp. 1234–1245, 2020. [19] ——, “Accelerating matrix operations with improved deeply
[2] A. White and R. Miller, “Advanced techniques in signal process- pipelined vector reduction,” IEEE Trans. on Parallel and Dist. Sys.,
ing for next-generation communication systems,” in Proc. IEEE vol. 23, pp. 202–210, 2012.
Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2021, [20] L. Tang, Z. Huang, G. Cai, Y. Zheng, and J. Chen, “A novel
pp. 6453–6457. reduction circuit based on binary tree path partition on FPGAs,”
[3] M. Johnson and S. Thompson, “Data compression techniques for Algorithms, vol. 14, p. 30, 2021.
big data applications,” Journal of Data Sci., vol. 17, pp. 89–103,
2019.

Computer Arithmetic - M. Vladutiu
No ratings yet
Computer Arithmetic - M. Vladutiu
269 pages
100 2nd Ed Vol1 9789811899461
No ratings yet
100 2nd Ed Vol1 9789811899461
223 pages
Carbonite - Setup Manual (4802DR 120 07.3) E PDF
No ratings yet
Carbonite - Setup Manual (4802DR 120 07.3) E PDF
49 pages
Implementation of PID Controller On FPGA
No ratings yet
Implementation of PID Controller On FPGA
19 pages
Implementation Methods
No ratings yet
Implementation Methods
30 pages
Digital Signal Processing With Field Programmable Gate Arrays
No ratings yet
Digital Signal Processing With Field Programmable Gate Arrays
42 pages
RC Presentation
No ratings yet
RC Presentation
10 pages
Assignment: - 4: Part - A
No ratings yet
Assignment: - 4: Part - A
9 pages
A New VLSI Architecture of Parallel Multiplier Accumulator Based On Radix 2 Modified Booth Algorithm
No ratings yet
A New VLSI Architecture of Parallel Multiplier Accumulator Based On Radix 2 Modified Booth Algorithm
9 pages
Dafir
No ratings yet
Dafir
4 pages
Power Area FILTERS
No ratings yet
Power Area FILTERS
8 pages
Design and Implementation of Single Precision Pipelined Floating Point Co-Processor
No ratings yet
Design and Implementation of Single Precision Pipelined Floating Point Co-Processor
4 pages
Da PDF
No ratings yet
Da PDF
8 pages
FPGA-Optimized High-Speed FIR Filters
No ratings yet
FPGA-Optimized High-Speed FIR Filters
6 pages
Trade-Offs in Multiplier Block Algorithms For Low Power Digit-Serial FIR Filters
No ratings yet
Trade-Offs in Multiplier Block Algorithms For Low Power Digit-Serial FIR Filters
6 pages
Distributed Arithmetic For The Design of High Speed Fir Filter Using Fpgas
No ratings yet
Distributed Arithmetic For The Design of High Speed Fir Filter Using Fpgas
9 pages
FIR Filter Design for DSP Using DA
No ratings yet
FIR Filter Design for DSP Using DA
6 pages
2006 DelayOptimizedRedundantBinaryAdders ICECS1 06
No ratings yet
2006 DelayOptimizedRedundantBinaryAdders ICECS1 06
5 pages
Priyanka - 50300 16 130
No ratings yet
Priyanka - 50300 16 130
4 pages
Multiplexer and PLD Implementation Guide
No ratings yet
Multiplexer and PLD Implementation Guide
53 pages
Design of Multiplier Less 32 Tap FIR Filter Using VHDL: Journal
No ratings yet
Design of Multiplier Less 32 Tap FIR Filter Using VHDL: Journal
5 pages
FPGA-Based Distributed Arithmetic
No ratings yet
FPGA-Based Distributed Arithmetic
13 pages
Validation of Octanary Adders in VHDL: Jasbir Kaur Parv Sapra
No ratings yet
Validation of Octanary Adders in VHDL: Jasbir Kaur Parv Sapra
4 pages
Module 1 3
No ratings yet
Module 1 3
33 pages
VLSI Topic 1720707023
No ratings yet
VLSI Topic 1720707023
26 pages
X X X W X X X: A Novel Design of 1024-Point Pipelined FFT Processor Based On Cordic Algorithm
No ratings yet
X X X W X X X: A Novel Design of 1024-Point Pipelined FFT Processor Based On Cordic Algorithm
4 pages
1 s2.0 S0045790624001459 Main
No ratings yet
1 s2.0 S0045790624001459 Main
11 pages
Lecture 5a Pipelining
No ratings yet
Lecture 5a Pipelining
21 pages
Addition / Subtraction: Parts Chapters I
No ratings yet
Addition / Subtraction: Parts Chapters I
103 pages
Weighted Partitioning For Fast Multiplierless
No ratings yet
Weighted Partitioning For Fast Multiplierless
5 pages
Pipelined Adders
No ratings yet
Pipelined Adders
9 pages
Review Article: CORDIC Architectures: A Survey
No ratings yet
Review Article: CORDIC Architectures: A Survey
20 pages
Analysis of Different Bit Carry Look Ahead Adder Using Verilog Code
No ratings yet
Analysis of Different Bit Carry Look Ahead Adder Using Verilog Code
8 pages
Session - 8 - PLA PAL
No ratings yet
Session - 8 - PLA PAL
14 pages
Full Adder Circuit for VLSI Students
No ratings yet
Full Adder Circuit for VLSI Students
6 pages
A 5GHz 128-Bit Binary Floating-Point Adder For The POWER6 Processor
No ratings yet
A 5GHz 128-Bit Binary Floating-Point Adder For The POWER6 Processor
4 pages
Parallel Adder
No ratings yet
Parallel Adder
20 pages
Kris Gaj: Research and Teaching Interests
No ratings yet
Kris Gaj: Research and Teaching Interests
47 pages
CS Undergrad NP-Completeness Curriculum
No ratings yet
CS Undergrad NP-Completeness Curriculum
13 pages
Implementation of High Performance FIR Filter Using High Speed & Low Area Multiplier
No ratings yet
Implementation of High Performance FIR Filter Using High Speed & Low Area Multiplier
5 pages
CAO - Lecutre5 Datapath Design
No ratings yet
CAO - Lecutre5 Datapath Design
43 pages
Enroll. No. - : Marwadi University
No ratings yet
Enroll. No. - : Marwadi University
5 pages
VLSI Architecture for Engineers
No ratings yet
VLSI Architecture for Engineers
8 pages
PDF 2.5
No ratings yet
PDF 2.5
37 pages
A New VLSI Architecture of Parallel Multiplier-Accumulator Based On Radix-2 Modified Booth Algorithm
No ratings yet
A New VLSI Architecture of Parallel Multiplier-Accumulator Based On Radix-2 Modified Booth Algorithm
8 pages
Coa Ct2 - QP - Set C - Answer Key 2
No ratings yet
Coa Ct2 - QP - Set C - Answer Key 2
10 pages
Vlsi Digital Signal Processing Keshab K Parhi
No ratings yet
Vlsi Digital Signal Processing Keshab K Parhi
66 pages
Theory (1) 1
No ratings yet
Theory (1) 1
118 pages
Fixed-Point and Floating-Point Arithmetic Design
No ratings yet
Fixed-Point and Floating-Point Arithmetic Design
4 pages
5GHz+ 128-bit Adder for POWER6
No ratings yet
5GHz+ 128-bit Adder for POWER6
4 pages
Unit 2 Architectures For Programmable Digital Signal-Processors
No ratings yet
Unit 2 Architectures For Programmable Digital Signal-Processors
57 pages
Advanced Arithmetic For The Digital Computer - Design of Arithmetic Units (PDFDrive)
No ratings yet
Advanced Arithmetic For The Digital Computer - Design of Arithmetic Units (PDFDrive)
150 pages
An Optimized Modified Parallel Implementation Design of Multiplier and Accumulator Operator
No ratings yet
An Optimized Modified Parallel Implementation Design of Multiplier and Accumulator Operator
39 pages
SIMULATIONOF32 BitArirhmeticUnitUsingXilinx
No ratings yet
SIMULATIONOF32 BitArirhmeticUnitUsingXilinx
11 pages
Implementationof Time Efficient VLSI Designusing Kogge Stone Adder
No ratings yet
Implementationof Time Efficient VLSI Designusing Kogge Stone Adder
6 pages
Booths Multe Du
No ratings yet
Booths Multe Du
10 pages
FPGA系統設計實務 L2
No ratings yet
FPGA系統設計實務 L2
8 pages
CST202 Lect. Note 3
No ratings yet
CST202 Lect. Note 3
25 pages
Slide 5 Wire and Interconnection
No ratings yet
Slide 5 Wire and Interconnection
32 pages
Slide 2 Fundamentals of MOS Devices
No ratings yet
Slide 2 Fundamentals of MOS Devices
49 pages
Slide 4 Inverter Design
No ratings yet
Slide 4 Inverter Design
35 pages
Slide 1 Introduction To VLSI Design
No ratings yet
Slide 1 Introduction To VLSI Design
41 pages
Electronics 14 02337
No ratings yet
Electronics 14 02337
18 pages
Deep-Learning Enabled Generalized Inverse Design of Multi-Port Radio-Frequency and Sub-Terahertz Passives and Integrated Circuits
No ratings yet
Deep-Learning Enabled Generalized Inverse Design of Multi-Port Radio-Frequency and Sub-Terahertz Passives and Integrated Circuits
13 pages
FM Jammer
100% (3)
FM Jammer
13 pages
Scalance - x1011 Commisioning Manual
No ratings yet
Scalance - x1011 Commisioning Manual
86 pages
Antenna Tuner - Wikipedia
No ratings yet
Antenna Tuner - Wikipedia
94 pages
TC120TD5
No ratings yet
TC120TD5
1 page
Engineer's Professional Profile
No ratings yet
Engineer's Professional Profile
4 pages
Product Brochure: Suzhou Cosuper Energy Technology Co.,Ltd
No ratings yet
Product Brochure: Suzhou Cosuper Energy Technology Co.,Ltd
8 pages
Design of 4 Bit Asynchronous Counter
100% (1)
Design of 4 Bit Asynchronous Counter
18 pages
Final Ppt1
No ratings yet
Final Ppt1
19 pages
Manuale Uta 2018-09 Eng
No ratings yet
Manuale Uta 2018-09 Eng
19 pages
Segmentation Case
No ratings yet
Segmentation Case
8 pages
Lecture Slide VLSI 02 Updated
No ratings yet
Lecture Slide VLSI 02 Updated
112 pages
Information Sciences: Science China
No ratings yet
Information Sciences: Science China
18 pages
IEEE Recommended Practice For Sizing Nic-5
No ratings yet
IEEE Recommended Practice For Sizing Nic-5
2 pages
Lab No. # 11 Transient Response Analysis of Series RC Circuit
No ratings yet
Lab No. # 11 Transient Response Analysis of Series RC Circuit
6 pages
Fruit Battery Experiment Guide
No ratings yet
Fruit Battery Experiment Guide
4 pages
Tri State
No ratings yet
Tri State
21 pages
GFM GFL Training 2024-01-09
No ratings yet
GFM GFL Training 2024-01-09
92 pages
Instrumentation Amplifier Guide
No ratings yet
Instrumentation Amplifier Guide
8 pages
Ideal Boiler Error Codes Guide
No ratings yet
Ideal Boiler Error Codes Guide
3 pages
A Cool (Looking) Gizmo?: AIO (Vol.2) - Paper 1 - Supp MT (Reading - Part B1)
No ratings yet
A Cool (Looking) Gizmo?: AIO (Vol.2) - Paper 1 - Supp MT (Reading - Part B1)
4 pages
VIVIX-S 4386W Service Manual: (FXRD-4386WB)
No ratings yet
VIVIX-S 4386W Service Manual: (FXRD-4386WB)
97 pages
Premium Sound Tools & Headphones
No ratings yet
Premium Sound Tools & Headphones
16 pages
Character Codes
No ratings yet
Character Codes
11 pages
Studio Mic Kit Guide for Musicians
No ratings yet
Studio Mic Kit Guide for Musicians
2 pages
Physics 0625: Static Electricity Guide
No ratings yet
Physics 0625: Static Electricity Guide
8 pages
Maxguard Catalogue 2022 - 19-19
No ratings yet
Maxguard Catalogue 2022 - 19-19
1 page

Jugglepac: A Pipelined Accumulation Circuit: Ahmad Houraniah, H. Fatih Ugurdag, Furkan Aydin

Uploaded by

Jugglepac: A Pipelined Accumulation Circuit: Ahmad Houraniah, H. Fatih Ugurdag, Furkan Aydin

Uploaded by

1

JugglePAC: a Pipelined Accumulation Circuit

Abstract—Reducing a set of numbers to a single value is TABLE I

be fully stored in memory and must be read sequentially, a0 a0 0 a0

in existing designs. data to achieve efficient accumulation. Given that data

a0:5 are completed. The simple control logic for the

number of counters. The output identifier is represented

Label Min. Dataset Frequency Total Latency

You might also like