0% found this document useful (0 votes)

31 views153 pages

Copyno1 Lastone February (Final)

Uploaded by

Sujala Patnaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views153 pages

Copyno1 Lastone February (Final)

Uploaded by

Sujala Patnaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 153

UNIT 1: INTRODUCTION TO ARITHMETIC AND LOGICAL UNIT

1.1-History to ALU

1.2-Introduction to ALU

1.3-Operations of ALU

1.4-ALU Signals

1.5-Configurations of ALU

1.6-Uses of ALU

1.7-Advantages of ALU

1.8-Disadvantages of ALU

1.9-Different Bit Sizes in ALU

UNIT 2: LITERATURE SURVEY

UNIT 3: EXPLORING ALU ARCHITECTURES-BITWISE INSIGHTS

3.1-Functional Blocks of a Microprocessor

3.2-Organization of Arithmetic and Logical Unit

3.3-Introduction to N-bit ALU

3.4- 4 bit Arithmetic and Logical Unit

3.5- 8 bit Arithmetic and Logical Unit

3.6- 16 bit Arithmetic and Logical Unit

3.7-32 bit Arithmetic and Logical Unit

PAGE \* MERGEFORMAT 2
UNIT 4: INTRODUCTION TO VERILOG HDL
4.1- Verilog Introduction

4.2-History of Verilog

4.3- Verilog Abstraction Levels

4.4-Lexical Tokens

4.5-Data Types

4.6-Operators

4.7-Modules

4.8-Writing a testbench in Verilog

4.9-Gate Level Modeling

4.10-Data flow Modeling

4.11-Behavioral Modeling

4.12- Procedural Assignments

4.13-if statement in Verilog

4.14-Case statement in Verilog

4.15-Loops in Verilog

4.16- Verilog blocks

4.17- Switch Level Modeling

4.18- Tasks and Functions in Verilog

UNIT 5: INTRODUCTION TO XILINX

5.1-Xilinx Technology
5.2-History of Xilinx
PAGE \* MERGEFORMAT 2
5.3-Technology
5.4-Xilinx ISE Design Flow
5.5-Family Lines of Products

UNIT 6: VLSI Design - FPGA Technology

6.1-Introduction to FPGA

6.2-Architectural Design of FPGA in VLSI

6.3-Categories of FPGA
6.4-Advantages of FPGA in VLSI

6.5-Disadvantages of FPGA in VLSI

6.6-Applications of FPGA in VLSI

6.7-Need of an FPGA in VLSI

6.8-Gate Array Design

6.9-Standard Cell Based Design

6.10- Custom Design

UNIT 7: SIMULATION RESULTS
7.1- 16 BIT ALU Proposed Block Diagrams
7.2- 32 BIT ALU Proposed Block Diagrams
7.3- 8-BIT ALU Results

7.4- 16-BIT ALU Results

7.5- 32-BIT ALU Results

PAGE \* MERGEFORMAT 2
TITLE: DESIGN AND IMPLEMENTATION OF 32-BIT HIGH SPEED & LOW POWER ALU

ABSTRACT:
The Arithmetic Logic unit (ALU) is a fundamental building block of the central processing
unit (CPU) of a computer. The processors found inside modern CPUs and graphics
processing units (GPUs) accommodate very powerful and complex ALUs; a single
component may contain a number of ALUs. The Arithmetic Logic unit mainly performs
arithmetic and logic operations. The arithmetic operations include mainly addition,
subtraction, multiplication, division, modulus, increment and decrement. Whereas, the
logical operations include OR, AND, NOT, NOR, NAND, XOR, XNOR, Shift and Rotate
operations.
The proposed project work is primarily aimed at the design of a high-speed, low-power
16/32-bit Arithmetic and Logic Unit, typically deploying Verilog HDL. This work is
predominantly focussing on the design of modules specifically favouring low power
consumption, high speed, and small area.
Xilinx ISE 8.1i tool is proposed to be used for simulation, synthesis, floor planning and static
timing analysis aspects of the design till the generation of GDA file. This design can be used
in various applications such as microprocessors, digital signal processors, and embedded
systems.
SOFTWARE:
We will be using Xilinx ISE 8.1i software to design a high-speed, low-power 16/32 bit
Arithmetic and Logical Unit using Verilog HDL.

Project Guide: Team Members:

Dr. S. Sridhar M. Tech, Ph. D G. Dharani Naidu(20KD1A0458)
D. Rashmitha(20KD1A0447)
Ch. Madhu(20KD1A0440)

PAGE \* MERGEFORMAT 2
1. INTRODUCTION TO ARITHMETIC AND LOGICAL UNIT

1.1-HISTORY OF ALU:

Mathematician John von Neumann proposed the ALU concept in 1945 in a report on the
foundations for a new computer called the EDVAC.

The cost, size, and power consumption of electronic circuitry was relatively high throughout
the infancy of the information age. Consequently, all serial computers and many early
computers, such as the PDP-8, had a simple ALU that operated on one data bit at a time,
although they often presented a wider word size to programmers. One of the earliest
computers to have multiple discrete single-bit ALU circuits was the 1948 Whirlwind I,
which employed sixteen such "math units" to enable it to operate on 16-bit words.

In 1967, Fairchild introduced the first ALU implemented as an integrated circuit, the
Fairchild 3800, consisting of an eight-bit ALU with accumulator. Other integrated-circuit
ALUs soon emerged, including four-bit ALUs such as the Am2901 and 74181.These devices
were typically "bit slice" capable, meaning they had "carry look ahead" signals that
facilitated the use of multiple interconnected ALU chips to create an ALU with a wider word
size. These devices quickly became popular and were widely used in bit-slice
minicomputers.

Microprocessors began to appear in the early 1970s. Even though transistors had become
smaller, there was often insufficient die space for a full-word-width ALU and, as a result,
some early microprocessors employed a narrow ALU that required multiple cycles per
machine language instruction. Examples of this includes the most popular Zilog Z80, which
performed eight-bit additions with a four-bit ALU. Over time, transistor geometries shrank
further, following Moore’s law, and it became feasible to build wider ALUs on
microprocessors.

Modern integrated circuit (IC) transistors are orders of magnitude smaller than those of the
early microprocessors, making it possible to fit highly complex ALUs on ICs. Today, many
modern ALUs have wide word widths, and architectural enhancements such as barrel shifters
and binary multipliers that allow them to perform, in a single clock cycle, operations that
would have required multiple operations on earlier ALUs.

ALUs can be realized as mechanical, electro-mechanical or electronic and, in recent years,

research into biological ALUs has been carried out.

The 4-bit ALU contained the equivalent of 75 logic gates and it was the very first to exist in
a single package. Today’s ALUs are more complex and have added features like barrel
shifters and binary multipliers, effectively making them capable of performing a higher
volume of more complex operations in a shorter amount of time.

PAGE \* MERGEFORMAT 2
PAGE \* MERGEFORMAT 2
1.2-INTRODUCTION TO ALU

An arithmetic-logic unit is the part of a central processing unit that carries out arithmetic and
logic operations on the operands in computer instruction words.

In some processors, the ALU is divided into two units: an arithmetic unit (AU) and a logic
unit (LU). Some Processors contain more than one AU. For example, one for fixed-point
operations and another for floating-point operations. In computer systems, floating-point
computations are sometimes done by a floating-point unit (FPU) on a separate chip called a
numeric coprocessor.

Typically, the ALU has direct input and output access to the processor controller, main
memory (random access memory or RAM in a personal computer) and input/output devices.
Inputs and outputs flow along an electronic path that is called a bus.

The input consists of an instruction word, sometimes called a machine instruction word, that
contains an operation code or "opcode," one or more operands and sometimes a format code.
The operation code tells the ALU what operation to perform and the operands are used in the
operation. For example, two operands might be added together or compared logically. The
format may be combined with the opcode and tells, for example, whether this is a fixed-point
or a floating-point instruction.

The output consists of a result that is placed in a storage register and settings that indicate
whether the operation was performed successfully. If it isn't, some sort of status will be
stored in a permanent place that is sometimes called the machine status word.

In general, the ALU includes storage places for input operands, operands that are being
added, the accumulated result (stored in an accumulator) and shifted results. The flow of bits
and the operations performed on them in the subunits of the ALU are controlled by gated
circuits.

The gates in these circuits are controlled by a sequence logic unit that uses a particular
algorithm or sequence for each operation code. In the arithmetic unit, multiplication and
division are done by a series of adding or subtracting and shifting operations.

There are several ways to represent negative numbers. In the logic unit, one of 16 possible
logic operations can be performed such as comparing two operands and identifying where
bits don't match.

The design of the ALU is a critical part of the processor and new approaches to speeding up
instruction handling are continually being developed.

PAGE \* MERGEFORMAT 2
1.3-OPERATIONS OF ALU

In computer science, ALUs serve as a combinational digital circuit that performs arithmetic
and bitwise operations on binary numbers. This is a foundational building block of arithmetic
logic circuits for numerous types of control units and computing circuits including central
processing units (CPUs), FPUs and graphics processing units.

Long before modern PCs, ALUs first helped to support microprocessors and transistors in
the 1970s.

PAGE \* MERGEFORMAT 2
PAGE \* MERGEFORMAT 2
The following are a few examples of bitwise logical operations and basic arithmetic
operations supported by ALUs:

Addition:
Adds A and B with carry-in or carry-out sum at Y.

Subtraction:
Subtracts B from A or vice versa with the difference at Y and carry-in or carry-out.

Increment:
Where A or B is increased by one and Y represents the new value.

Decrement:
Where A or B is decreased by one and Y represents the new value.

AND:
The bitwise logic AND of A and B is represented by Y.

OR:
The bitwise logic OR of A and B is represented by Y.

Exclusive-OR:
The bitwise logic XOR of A and B is represented by Y.

Shift:
ALU shift functions cause A or B operands to shift, either right or left, with the new operand
represented by Y. Complex ALUs utilize barrel shifters to shift A or B operands by any
number of bits in a single operation.

Rotate:

PAGE \* MERGEFORMAT 2
The rotate operand is treated as a circular buffer of bits so its least and most significant bits
are effectively adjacent. Rotations are generally less useful than shifts.

Rotate through carry:

The carry bit and operand in the rotate through carry are collectively treated as a circular
buffer of bits.

1.4-ALU SIGNALS

The ALU contains a variety of electrical input and output connections, which result in the
digital signals being cast between the ALU and the external electronics. External circuits
send signals to the ALU input, and the ALU sends signals to the external electronics.

Opcode: The operation selection code specifies whether the ALU will conduct arithmetic or
a logic operation when it performs the operation.

Data: The ALU contains three parallel buses, each with two input and output operands.
These three buses are in charge of the same amount of signals.

Status
Input: Once ALU has completed the operation, then the status inputs allow the ALU to
obtain more information needed to complete the process successfully. A single “carry-in” is
used, which is a stored carry-out from the prior ALU operation.

Output: The status outputs, which are numerous signals, offer the results of an ALU
operation in the form of extra data. Overflow, carry out, zero, negative, and other status
signals are usually handled by general ALUs. The status output signals were stored in the
external registers after the ALU completed each operation. These signals are saved in
external registers, which allows them to be used in future ALU operations.

PAGE \* MERGEFORMAT 2
1.5-CONFIGURATIONS OF ALU

The following is a description of how the ALU interacts with the processor. These
configurations are included in every arithmetic logic unit:

 Accumulator
 Instruction Set Architecture
 Stack
 Register Stack
 Register to Register
 Register Memory

Accumulator
In the context of a computer's Arithmetic Logic Unit (ALU), an accumulator is a register that
stores the results of arithmetic and logic operations. The accumulator is a special-purpose
register designed to perform arithmetic and logic operations efficiently. It plays a crucial role
in many central processing unit (CPU) architectures.
Here are the key functions and characteristics of an accumulator in an ALU:
1. Storage of Interim Results: The accumulator temporarily holds the results of arithmetic
and logic operations. For example, when you perform an addition operation, the sum is
stored in the accumulator.

2. Default Operand: In some instruction sets and architectures, the accumulator serves as a
default operand for arithmetic and logic operations. This means that one of the operands for
an operation is implicitly the content of the accumulator.

PAGE \* MERGEFORMAT 2
3. Accumulative Operations: The accumulator is often used in accumulative operations,
where the result of the previous operation becomes one of the operands for the next
operation. This is particularly useful in iterative calculations.

4. Simplifying Instruction Encoding: Using an accumulator can simplify the encoding of

instructions since it provides a default operand. This can reduce the number of bits needed to
represent instructions, making the instruction set more compact.

5. Example: In a simple instruction set architecture (ISA), an addition operation might be

encoded as "ADD A, B," where A is the accumulator, and B is another operand. The result of
the addition is stored back in the accumulator.
Here's a simple example in assembly language:
LOAD A, 5; Load the accumulator with the value 5
ADD B, 3; Add the value 3 to the accumulator
STORE C, A; Store the content of the accumulator in memory location C

In this example, the accumulator (A) is used for intermediate storage during the addition
operation.

It's important to note that while the accumulator is a common design choice, not all
architectures use this approach. Some processors have a set of general-purpose registers
where operands and results are manipulated. The use of an accumulator is just one design
decision among many when designing an ALU and CPU architecture.

Instruction Set Architecture (ISA)

The Instruction Set Architecture (ISA) is the part of the processor that is visible to the
programmer or compiler writer. The ISA serves as the boundary between software and
hardware. We will briefly describe the instruction sets found in many of the microprocessors
used today. The ISA of a processor can be described using 5 categories:

 Operand Storage in the CPU

 Number of explicit named operands
 Operand location
 Operations
PAGE \* MERGEFORMAT 2
 Type and size of operands

Of all the above the most distinguishing factor is the first.

The 3 most common types of ISAs are:

1. Stack - The operands are implicitly on top of the stack.

2. Accumulator- One operand is implicitly the accumulator.
3. General Purpose Register (GPR) - All operands are explicitly mentioned, they are
either registers or memory locations.

Stack

A stack is a data structure that follows the Last In, First Out (LIFO) principle. It means that
the last item added to the stack is the first one to be removed. Stacks are often used in
computer architecture to manage data and control flow during program execution.

Whenever the latest operations are performed, these are stored on the stack that holds
programs in top-down order, which is a small register. When the new programs are added to
execute, they push to put the old programs.

In the context of an ALU or a processor in general, a stack can be used for various purposes,
including:

Function Calls and Subroutines: When a program calls a function or a subroutine, the
return address and local variables are often pushed onto the stack. This allows the processor
to return to the correct point in the program after the function or subroutine completes.

Operand Storage: Some processors use a stack to store operands for arithmetic operations.
For example, postfix notation (also known as Reverse Polish Notation or RPN) uses a stack
to store operands and operators, making it easier to evaluate expressions.

Interrupt Handling: When an interrupt occurs, the processor may push the current state of
the program onto the stack before servicing the interrupt. After handling the interrupt, the
processor can restore the program state from the stack.

Register Spilling: In some cases, when there are not enough registers available to store
variables, a stack can be used to spill excess data. This involves pushing some register
contents onto the stack and later popping them back when needed.

While the ALU itself is responsible for performing arithmetic and logic operations on data,
the stack is a separate component that helps manage the flow of data and control during
program execution. The stack can be implemented in hardware, software, or a combination
of both, depending on the processor architecture.
PAGE \* MERGEFORMAT 2
Register Stack

A stack register is a computer central processor register whose purpose is to keep track of a
call stack. On an accumulator-based architecture machine, this may be a dedicated register.
On a machine with multiple general-purpose registers, it may be a register that is reserved by
convention, such as on the IBM System/360 through Architecture and RISC architectures, or
it may be a register that procedure call and return instructions are hardwired to use, such as
on the PDP-11, VAX, and Intel x86 architectures. Some designs such as the Data General
Eclipse had no dedicated register, but used a reserved hardware memory address for this
function.

Machines before the late 1960 such as the PDP-8 and HP 2100 did not have compilers which
supported recursion. Their subroutine instructions typically would save the current location
in the jump address, and then set the program counter to the next address. While this is
simpler than maintaining a stack, since there is only one return location per subroutine code
section, there cannot be recursion without considerable effort on the part of the programmer.

A stack machine has 2 or more stack registers, one of them keeps track of a call stack, the
others keep track of other stacks.

Generally, the combination of Register and Accumulator operations is known as for Register
- Stack Architecture. The operations that need to be performed in the register-stack
Architecture are pushed onto the top of the stack. And its results are held at the top of the
stack. With the help of using the Reverse polish method, more complex mathematical
operations can be broken down. Some programmers, to represent operands, use the concept
of a binary tree. It means that the reverse polish methodology can be easy for these
programmers, whereas it can be difficult for other programmers. To carry out Push and Pop
operations, there is a need to be new hardware created.

Register to Register
It includes a place for 1 destination instruction and 2 source instructions, also known as a 3-
register operation machine. This Instruction Set Architecture must be more in length for
storing three operands, 1 destination and 2 sources. After the end of the operations, writing
the results back to the Registers would be difficult, and also the length of the word should be
longer. However, it can be caused to more issues with synchronization if write back rule
would be followed at this place.

The MIPS component is an example of the register-to-register Architecture. For input, it uses
two operands, and for output, it uses a third distinct component. The storage space is hard to
maintain as each needs a distinct memory; therefore, it has to be premium at all times.
Moreover, there might be difficult to perform some operations.

PAGE \* MERGEFORMAT 2
Register Memory
Register memory is the smallest and fastest memory in a computer. It is not a part of the
main memory and is located in the CPU in the form of registers, which are the smallest data
holding elements. A register temporarily holds frequently used data, instructions, and
memory address that are to be used by CPU. They hold instructions that are currently
processed by the CPU. All data is required to pass through registers before it can be
processed. So, they are used by CPU to process the data entered by the users.

Registers hold a small amount of data around 32 bits to 64 bits. The speed of a CPU depends
on the number and size (no. of bits) of registers that are built into the CPU. Registers can be
of different types based on their uses. Some of the widely used Registers include
Accumulator or AC, Data Register or DR, the Address Register or AR, Program Counter
(PC), I/O Address Register, and more.

1.6-USES OF ALU

Arithmetic logic units are integral to nearly every computing system capable of processing
data. It is found in all advanced processors, and its core uses include the following:

PAGE \* MERGEFORMAT 2
 PCs and laptops: In standard computers, the ALU performs the computations and
comparisons necessary to run a wide variety of software programs. This includes word
processing, spreadsheet, and graphics applications.
 Mainframes and servers: The ALU is used by main frames and servers to conduct the
necessary computations and comparisons to process client requests and provide the
appropriate responses.
 Embedded systems and IoT: The ALU is employed in embedded devices intended to
execute a single function inside a broader device or ecosystem. Examples include airplane
control mechanisms, medical devices, automobile systems, and Internet of Things
(IOT) enabled technology.
 Mobile devices: An ALU in mobile devices like smartphones and tablets performs
computations and comparisons, similar to what it does on personal computers. This
allows individuals to browse the internet and execute programs.

ALUs, in addition to doing addition and subtraction calculations, also handle the process of
multiplication of two integers because they are designed to perform integer calculations;
thus, the result is likewise an integer. Division operations, on the other hand, are frequently
not done by ALU since division operations can result in a floating-point value. Instead,
division operations are normally handled by the floating-point unit (FPU), which may also
execute other non-integer calculations.

Engineers can also design the ALU to do any operation they choose. However, as the
operations become more sophisticated, ALU becomes more expensive since it generates
more heat as well as takes up more space on the CPU. Therefore, engineers create powerful
ALUs, ensuring that the CPU is both quick and powerful.

The ALU performs the computations required by the CPU; most of the operations are logical
in nature. If the CPU is built more powerful, it will be designed on the basis of the ALU.
Then it generates more heat and consumes more energy or power. As a result, there must be a
balance between how intricate and strong ALU is and how much it costs. The primary reason
why faster CPUs are more expensive is that they consume more power and generate more
heat due to their ALUs. The ALU’s major functions are arithmetic and logic operations, as
well as bit-shifting operations.

1.7-Advantages of ALU

 It supports high-performance parallel architecture and applications.

 It can provide the desired output at the same time and combine integers and floating-point
variables.
PAGE \* MERGEFORMAT 2
 It has the ability to carry out instructions on a large number of items and has a high level
of precision.
 The ALU can combine two arithmetic operations in the same code, such as multiplication
and addition or subtraction and addition, or any two operands. A+B*C is an example.
 They remain consistent throughout the presentation, and they’re spaced in such a way that
they do not interrupt any of the segments.
 It is, in general, highly rapid, and as a result, it produces results swiftly.
 With ALU, there are no sensitivity difficulties or memory wastage.
 They are less costly and reduce the number of logic gates required.
 There are no sensitivity issues with ALU.
 There is no memory wastage with ALU.
1.8-Disadvantages of ALU

 With the ALU, floating variables have more delays, and the designed controller is not
easy to understand.
 The bugs would occur in our result if memory space were definite.
 It is difficult to understand amateurs as their circuit is complex; also, the concept of
pipelining is complex to understand.
 A proven disadvantage of ALU is that there are irregularities in latencies.
 Another demerit is rounding off, which impacts accuracy.

1.9-Different Bit Sizes in ALU

An Arithmetic Logic Unit (ALU) is a digital circuit that performs arithmetic and logical
operations on binary numbers. The "bit" in these terms refers to the size of the data that the
ALU can process in a single operation. Here's a brief explanation of 4-bit, 8-bit, 16-bit, and
32-bit ALUs:

1. 4-bit ALU:

- Can process binary numbers that are 4 bits in length.

- Performs arithmetic and logical operations on two 4-bit inputs.

- Examples of operations include addition, subtraction, AND, OR, XOR, etc.

2. 8-bit ALU:

PAGE \* MERGEFORMAT 2
- Handles binary numbers that are 8 bits in length.

- Supports a wider range of operations compared to a 4-bit ALU.

- Common in early microprocessors and simple computer systems.

3. 16-bit ALU:

- Processes binary numbers of 16 bits in length.

- Offers increased precision and a larger range of representable values compared to smaller

bit ALU’S.

- Used in more complex systems where greater computational power is required.

4. 32-bit ALU:

- Designed to work with binary numbers that are 32 bits long.

- Provides even higher precision and a larger range of values.

- Common in modern microprocessors and computer architectures.

The bit size of an ALU is a critical factor in determining the computational capabilities of a
processor. Larger bit sizes generally allow for more complex calculations and the
manipulation of larger data sets. However, they may also come with increased hardware
complexity and power consumption. The choice of ALU bit size depends on the specific
requirements of the application and the desired balance between computational power and
hardware complexity.

PAGE \* MERGEFORMAT 2
2. LITERATURE SURVEY
1. A. Vahid Khorasani Et al. [1] researched about emphasizes reliable communication in
transmission applications, focusing on designing a fault-tolerant ALU for ARM
processors. The proposed ALU corrects any 5-bit errors in 32-bit input registers,
demonstrating high reliability with low hardware overhead on Spartan-3 FPGA. The
study extends to satellite communication, using a (63, 36) BCH code to remove faulty
data in a new ALU system. Simulations and verification confirm its effectiveness,
offering advantages in reliability, area occupation, and fault coverage. Future work may
involve implementing the algorithm on a 64-bit microprocessor.

2. Gokul Govindu Et al. [2] addressed the challenge of implementing signal processing
algorithms on hardware, emphasizing the historical preference for fixed-point arithmetic
due to quantization effects. However, recent FPGA advancements have mitigated
performance and power overheads associated with floating-point arithmetic. The study
focuses on double precision matrix multiplication, a key embedded computing kernel,
demonstrating that FPGAs can outperform general-purpose processors in high precision
floating-point applications. Existing FPGA-based floating-point units often overlook
recent FPGA advances and lack application-centric performance analysis. The paper
offers a preliminary analysis and hints at future work involving extensive design trade-
offs and the release of an open-source library for floating-point units.

3. Dhiraj Sangwan Et al. [3] explored about the implementation of Adder/Subtractor and
multiplication functional units in hardware using VHDL. The focus is on achieving speed
and gate count optimization, with algorithms recast for clarity and lower cost designs.
The architectures, selected based on speed or area considerations, are synthesized using
Synopsys tools and implemented on Xilinx FPGA Compiler. For Text-To-Speech
conversion, the Floating-Point multiplication uses a sequential architecture with Booth's
Radix-4 recoding, resulting in reduced gate count (3811) and 15.2ns per cycle. Floating-
Point addition employs a combinational architecture, yielding a gate count of 4427 and
results in 77.94ns.

4. K. Dhanumjaya Et al. [4] designed low- and ultra-low-power 32-bit Arithmetic and
Logic Units (ALUs) for microprocessors. Guidelines are presented for high-performance
ALUs, analysing logical operation structures suitable for low-power applications. VHDL
modelling and simulations using XILINX ISE tools demonstrate a novel method for
choosing optimal flip-flop and latch designs, achieving an 18%-24% reduction in total
energy and up to 22%-32% reduction in leakage power in standby mode for 180nm-65nm
CMOS technologies.

5. Vahid Khorasani Et al. [5] introduced a 32-bit Fault-tolerant ALU hardware

implementation, comparing it with established techniques like Residue code and Triple
Modular Redundancy (TMR) in space applications. Utilizing a (63, 36) BCH codec on a
Spartan-3 FPGA, the new ALU design demonstrates the lowest hardware overhead and
can correct any 5-bit errors in 32-bit input registers.
PAGE \* MERGEFORMAT 2
Comparative analysis with other fault-tolerant methods reveals a 75% reduction in
hardware overhead, highlighting the proposed method's efficiency and high fault
coverage. Future research is suggested to explore time redundancy in fault-tolerant
methods for 32-bit ALU.

6. Shrivastava Purnima Et al. [6] discussed about VHDL environment for floating-point
arithmetic and logic unit (ALU) design with pipelining is introduced, presenting a novel
approach to enhance ALU performance. The top-down design incorporates four
arithmetic modules (addition, subtraction, multiplication, and division) organized through
sub-modules. Selection bits are utilized to choose operations, and the design is
implemented and validated using VHDL simulation in the Xilinx 12.1i platform. The
proposed pipelined floating-point ALU design is successfully tested with various vectors,
and ongoing research aims to further reduce hardware complexity for synthesis and
implementation on an Altera FPGA chip.

7. Kaushik Chandra Deva Sarma Et al. [7] designed and synthesised a 32-bit Arithmetic
Logic Unit (ALU) using VHDL and Xilinx ISE 9.1i, targeted for Spartan devices. The
ALU is crucial in CPUs, performing arithmetic, logical operations, and additional
functions like parity, overflow, and zero checking. The research introduces new
operations, enhancing the ALU's capabilities for modern VLSI industry requirements. The
designed ALU is verified through Xilinx ISE Design Suitev9.1i, confirming theoretical
consistency in various operations.

8. Liril George Et al. [8] presented the design of a 32-bit Arithmetic Logic Unit (ALU)
using VHDL for the central processing unit of a computer system. To address power
consumption concerns, clock gating is employed, resulting in a 10% reduction in dynamic
power compared to a non-clock-gated 32-bit ALU. The designed ALU, implemented on
Xilinx Spartan 3E FPGA, performs 7 arithmetic, 4 logical, and 2 shift operations. Its
efficiency is crucial for minimizing power usage and optimizing space in the CPU.

9. Naresh Grover Et al. [9] discussed about FPGAs traditionally favored fixed-point
algorithms due to resource efficiency, but there's a growing shift towards floating-point
implementations in response to evolving system requirements. The rapid progress in
FPGA technology enhances the appeal of floating-point arithmetic implementations,
offering advantages over Application Specific Integrated Circuits in terms of development
time, cost, and flexibility. A 32-bit floating-point arithmetic unit adhering to the IEEE 754
Standard is designed using VHDL and tested on Xilinx. Simulink models in MATLAB
validate the VHDL code, and optimization in MATLAB can improve performance
parameters.

10.Mahaveer Singh Sikarwar Et al. [10] implemented a 64-bit ALU with clock gating on
FPGA for low power and high-speed applications. Clock gating is employed to control
PAGE \* MERGEFORMAT 2
the clock signal activity, reducing power consumption during transitions and optimizing
performance. The proposed design is successfully implemented and simulated on Xilinx
XC3S500E FPGA, with software simulations conducted on the Xilinx ISE Test-bench
Simulator.

11.Mohammad Ziaullah Et al. [11] emphasized a specialized Floating-Point Unit (FPU) in

a computer system efficiently manages floating-point operations like addition,
subtraction, multiplication, and division. Implemented in Verilog, the FPU handles
Floating Point data, performs IEEE754 format conversions, and employs optimized
algorithms for arithmetic operations. This design ensures efficiency, speed, and increased
throughput, especially when pipelined. The addition of two floating-point numbers is
showcased by separately presenting the sign, exponent, and mantissa.

12.V. Prasanth Et al. [12] introduced a novel Look-Ahead Clock Gating (LACG) method,
combining synthesis-based, data-driven, and Auto-Gated Flip Flops (AGFF) approaches
to enhance digital architecture through VLSI technology. Focusing on power reduction in
digital systems, particularly in the Arithmetic Logic Unit (ALU) of CPUs, the LACG
method calculates clock enabling signals one cycle ahead based on present flip-flop data,
overcoming drawbacks of existing methods. The proposed technique is compared with
data-driven clock gating, demonstrating superior power efficiency in a 32-bit ALU
implemented on Xilinx FPGA using VHDL.

13.Neha Agrawal Et al. [13] focussed on creating an energy-efficient Arithmetic and Logic
Unit (ALU) on a 90nm Virtex-4 FPGA, addressing power dissipation concerns in high-
performance ALU design. Employing clock gating techniques and analysing various
input/output standards and Wi-Fi frequencies, the research achieves up to 95.35%
reduction in total power dissipation. Notably, the ALU performs most efficiently with
LVDCI_15 I/O standard, reaching a minimum power dissipation of 0.362 watts with
Clock Gating at 0.9GHz. The study observes a maximum percentage change in total
power reduction of 95.35% with LVCMOS15 I/O standard across different frequencies.

14.Shivangi Singh Et al. [14] presented a 32-bit ALU designed on a 40nm Virtex-6 FPGA
for Wi-Fi ah channel operation at 0.9 GHz. The design incorporates voltage scaling and
capacitance scaling to analyse their impact on power dissipation. Voltage scaling proves
efficient, achieving an 82.05% reduction in clock power when transitioning from 1.2V to
0.5V at 0.9 GHz. Capacitance scaling at different voltages demonstrates notable power
reductions, with a maximum total power reduction of 55.14% at 0.5V. The energy-
efficient ALU is tailored for the upcoming Wi-Fi channel at 0.9 GHz.

15.Mukesh Pandit Mahajan Et al. [15] designed and simulated a 64-bit Arithmetic and
Logic Unit (ALU) using VHDL and Xilinx ISE software. This ALU performs
fundamental operations such as arithmetic, logic, shift, and rotate through parallel
PAGE \* MERGEFORMAT 2
implementation of dedicated units. The synthesized top-level module is targeted for
Spartan devices, and verification against theoretical values confirms accurate
functionality for all performed operations.

16.Jozef Kulisz Et al. [16] discussed about FPGA-based Arithmetic and Logic Unit (ALU)
for a Programmable Logic Controller (PLC), executing 32 operations including logic,
comparators, and basic arithmetic on 32-bit words. Synthesized in Verilog and VHDL, the
design aligns with the EN 61131-3 norm for machine language compatibility. Future work
aims to optimize resources, exploring techniques like resource sharing and pipelining,
while expanding the ALU's instruction set to include more complex operations.

17.Shruti Murgai Et al. [17] introduced an optimized Arithmetic Logic Unit (ALU) using a
power-efficient carry select adder. The incorporation of a power-optimized full adder
results in significant reductions of 12.5% in total device power, 53.39% in hierarchy
power, and a 3% decrease in completion time. Implemented on a Kintex FPGA with
28nm technology in Verilog HDL, the ALU is successfully simulated on ModelSim 10.3c
and verified using System Verilog on QUESTASIM in a UVM environment.

18.Martin Langhammer Et al. [18] presented an innovative FPGA DSP Block architecture
supporting both fixed and single-precision floating-point arithmetic efficiently. It
overcomes routing and soft logic limitations by unifying the DSP Block, integrating a
separate adder/subtracter unit for floating-point operations, and enabling efficient
combination of multiplier and adder. This design achieves a cost-effective, low-power,
and high-density floating-point platform on 20nm FPGAs while maintaining fixed-point
performance. The proposed solution involves minimal impact on DSP block size for
future enhancements, including support for subnormal numbers.

19.J.P. Verma Et al. [19] highlighted the importance of VHDL in digital design through the
implementation of a 32-bit Arithmetic Logic Unit (ALU). The ALU, a vital CPU
component, executes 14 operations, including arithmetic, logical, and shift functions.
Using VHDL and Modelism 5.4a, the project successfully implements and tests the 32-bit
ALU on Xilinx FPGA, verifying theoretical accuracy via waveform analysis. Overall, the
project showcases VHDL's versatility in creating a robust and functional 32-bit ALU for
digital applications.

20.J. Thameema Begum Et al. [20] introduced a reconfigurable ALU, merging a 32-bit
floating-point adder/subtractor and an integer ALU. Implemented in Verilog HDL and
simulated with Model Sim on Spartan3E FPGA, the functional unit showed 25% slices,
9% slice flip-flops, and 18% 4-input LUTs. It achieved a max frequency of 81.614MHz
with a 82.46mW power consumption. Suited for data-parallel applications, the study
proposes future improvements, such as decoder use to cut power, addition of
multiplier/divider logic, and support for various input bit sizes.
PAGE \* MERGEFORMAT 2
21.Urvish Lakadiwala Et al. [21] designed an ALU on Xilinx Ise FPGA using Verilog
involves configuring the FPGA post-manufacturing. The ALU should feature three flags,
one output line, a reset line, two inputs, and the specified operation. The project aims to
demonstrate ALU implementation on the built microcontroller using Xilinx ise, with
validation through result matching to ensure successful construction.

22.Mohammed F. Tolba Et al. [22] designed and implemented an Arithmetic Logic Unit
(ALU) for mobile GPU processors, incorporating an Approximated Precision Shader and
Look-Up Table (LUT) multiplier. The ALU employs a combination of lookup table,
Wallace tree, and Carry Look ahead Adder (CLA) to enhance the speed of the multiplier
operation. Designed using Verilog and verified on Xilinx Virtex-5 XC5VLX30 FPGA, the
proposed ALU demonstrates improved performance by reducing the number of partial
products, resulting in faster multiplication. Simulations using Xilinx ISE confirm the
overall superior performance of the proposed ALU compared to conventional designs.

23.Asra Fatima Ghouri Et al. [23] designed a 32-bit ALU VLSI architecture, considering
factors like logic delay, wattage, and chip space. Various adder configurations are
explored, with the Carry-Skip-Adder meeting execution standards for the ALU. The
architecture utilizes mixed logic techniques, employing CMOS, pseudo NMOS, and pass
transistor techniques. The entire ALU is organized, simulated in HDL, and implemented
on FPGA Spartan 3E kits for real-time realization.

24.Shinjini Yadav Et al. [24] introduced a fault-tolerant Arithmetic and Logic Unit (ALU)
designed for digital transmission and storage systems, addressing the critical need for
reliability in VLSI technology and digital processing. The study focuses on Binary BCH
codes, a subclass of error-correcting codes, implemented through the Xilinx ISE tool for a
32-bit ALU. The proposed BCH codec synthesis system utilizes innovative approaches,
such as a novel sum of products circuit and a Dual-Polynomial Basis Multiplier
architecture, enhancing efficiency and reducing hardware requirements for error detection
and correction in digital systems.

24.Vikesh Ukande Et al. [25] focussed on designing and verifying a 32-bit Arithmetic and
Logical Unit (ALU) using VHDL. This ALU accepts two 32-bit numbers and a user-
provided operation code, performing arithmetic and logical operations such as addition,
subtraction, shifting, increment, and decrement. The VHDL code was successfully
simulated in I-Sim to obtain waveforms, and synthesis was carried out using Xilinx ISE
for implementation on the Xilinx platform. The project aims to enhance the
computational capabilities of a computer's Central Processing Unit (CPU) through the
designed ALU

PAGE \* MERGEFORMAT 2
25.Dil Muhammed Akbar Et al. [26] presented a thermal energy-efficient 32-bit ALU
design for network processors, employing six different SSTL I/O standards on 28nm
technology. Analysis of power consumption at various frequencies based on IEEE 802.11
standards reveals that SSTL15 demonstrates the highest thermal efficiency, particularly at
minimum and maximum temperatures. Notably, SSTL135_R exhibits a maximum 8.88%
thermal power reduction when WLAN devices shift from 343.15K to 283.15K across
different frequencies, highlighting its effectiveness for thermal optimization in electronic
devices such as WLANs. However, caution is warranted at 60 GHz, where junction
temperature exceeds the absolute maximum.

26.Kaushal Kumar Sahu Et al. [27] introduced a novel 32-bit Arithmetic and Logical Unit
(ALU) architecture for graphics processors, addressing issues of power consumption and
performance. The design incorporates four sub-ALU blocks of eight bits each, featuring
varying levels of accuracy (100%, 98%, 95%, and 90%). Implemented on Xilinx 14.2
with 28nm technology FPGA (Artix-7), the architecture demonstrates a significant 75%
improvement in delay and frequency, a 50% enhancement in power efficiency, and a 72%
reduction in area compared to existing approaches.

27.Prachi Sharma Et al. [28] presented the construction of a 16-bit Arithmetic Logic Unit
(ALU) using VHDL in Xilinx Vivado 14.7 and implementation on a Basys 3 Artix 7
FPGA board. The study involves simulation, synthesis, and analysis of ALU parameters,
focusing on efficiency measures such as speed, power, and utilization. The designed ALU
operates on 32-bit operands, with potential for extension to 64-bit precision and
additional mathematical operations like trigonometric and logarithmic functions. The
comprehensive flow of RTL design is explored, encompassing top-level RTL module
design, verification through simulation, synthesis to obtain gate-level netlist, and
successful implementation on the FPGA.

28.Manjusha M. Kinage Et al. [29] focussed on addressing time-to-market pressures and

improving productivity in embedded systems by utilizing Commercial-Off-The-Shelf
(COTS) devices to design a soft core processor with dual cores. The processor
incorporates various COTS components such as UART, I2C, RAM, ROM, and ALU.
VHDL programming is employed to create two processor cores that communicate within
a single program. Future steps involve analyzing power consumption using Xilinx Power
Analyzer, optimizing design with XPA's power optimization options, and evaluating the
trade-off between power, area, and delay.

29.Farzin Piltan Et al. [30] implemented a 4-bit Field Programmable Gate Array (FPGA)-
based Minimum Control Unit (MCU) with carry lookahead algorithm on a Spartan 3E
FPGA using Xilinx software. The MCU controls data transfer and processes input data for
an Arithmetic Logic Unit (ALU). The use of hardware description language (HDL) and
PAGE \* MERGEFORMAT 2
lookahead design significantly reduces propagation and contamination delays, achieving
a 10% delay reduction compared to ripple carry. The carry lookahead ALU enhances
execution speed, providing a 25% timing improvement over ripple carry ALU for high-
speed controllers.

30.Abdul Rehman Buzdar Et al. [31] designed a 32-bit Arithmetic Logic Unit (ALU) for a
CPU using Ripple Carry Adder (ALU-RCA) and Sklansky Adder (ALU-SKL) circuits.
VHDL is employed for implementation on a 130nm CMOS platform by ST
Microelectronics using Cadence EDA tools. Synthesis results reveal that ALU-RCA
exhibits faster changes in area to meet strict timing constraints, while ALU-SKL
maintains better efficiency in terms of area and power consumption. The study suggests
selecting ALU-RCA for lower timing constraints, prioritizing area and power efficiency.

31.Amana Yadav Et. al. [32] discussed about the crucial role of the Floating Point Unit
(FPU) as an integral component in advanced processors, emphasizing its application in
high-performance tasks such as mathematical analysis and signal processing. The FPU
handles intricate arithmetic operations on floating-point numbers, adopting the IEEE 754
format, either in 32-bit (single precision) or 64-bit (double precision). Using VHDL, the
paper outlines the detailed procedures for computing addition, subtraction, and
multiplication operations, and subsequently simulates and synthesizes the designed FPU.
Performance evaluation on a Vitex 5 FPGA module includes metrics such as area
occupied and delay, with specific details on total combinational logic delay and routing
delay provided. The conclusion suggests potential optimizations for the prenormalization
and post-normalization units of the FPU to reduce hardware requirements and delay.

32.Swamynathan Et. al. [33] discussed the design and implementation of a 32-bit
Arithmetic Logic Unit (ALU) using Verilog HDL with logical gates such as AND and
OR. The ALU is a critical component in digital system design and is commonly found in
processors, calculators, cell phones, and computers. The focus of the design is on
reducing power consumption by utilizing reversible logic gates, which have gained
importance in low-power VLSI design techniques. The implemented ALU based on
reversible logic exhibits a significant reduction in power consumption, approximately
5.1%, compared to ALUs designed with non-reversible logic gates. The design also
achieves improvements in delay and power dissipation, making it a promising
advancement in the field of low-power processor architecture.

33.Sharifah Mumtazah Et. al. [34] implemented an Euclid's algorithm for computing the
greatest common divisor (GCD) of non-negative integers, a fundamental operation in
public key cryptographic algorithms. The focus is on developing a fast GCD coprocessor
with variable precisions ranging from 32-bit to 1024-bit. The implementation is evaluated
across seven field programmable gate arrays (FPGA) chip families, including Altera and
PAGE \* MERGEFORMAT 2
Xilinx devices, based on factors such as maximum frequency, total delay values,
hardware utilization, and FPGA thermal power dissipation. Results indicate that the
proposed coprocessor, particularly on Xilinx Vertix-7 XC7VH290T-2-HCG1155 and
XC7K70T-2-FBG676 devices, achieves impressive maximum frequencies, minimal
resource utilization, and scalability. The paper concludes that the designed coprocessor is
faster than many state-of-the-art solutions, offering up to two times higher throughput
efficiency in GCD computations for different data path sizes.

34.Prathyusha Kuncha Et. al. [35] proposed a low-power 1-bit CMOS Arithmetic and
Logic Unit (ALU) designed to minimize power consumption and delay. The focus on low
power is essential for enhancing circuit efficiency, reducing heat dissipation, and
extending battery life in electronic devices. The ALU processor employs the FANIN
technique with universal gates realization, and the design is simulated and analyzed using
Tanner and TSpice software. The 1-bit ALU is a fundamental building block in the central
processing unit of a computer, and this work specifically explores gate-level analysis with
the FANIN concept to achieve low-power circuits with a reduced circuit size. The
conclusion presents power values for NAND and NOR realizations, highlighting that
NOR realization consumes less dynamic power due to reduced switching time, especially
in 4-input NOR configurations. The paper demonstrates that decreasing the supply
voltage (VDD) in the 4-input NOR realization further reduces power consumption,
ultimately leading to the development of a low-power 8-bit ALU.

35.Naresh Grover Et. al. [36] discussed about the importance of efficiency and speed in the
digital design domain, with a specific emphasis on asynchronous processors to address
challenges in synchronous architectures. The advantages of asynchronous processors,
particularly in System on Chip (SOC) applications, are highlighted, including reduced
crosstalk, ease of multi-rate circuit integration, reusability, and lower power consumption.
The objective is to design and simulate the control unit of a 32-bit asynchronous
processor using VHDL and Xilinx ISE tool. The paper presents a robust control unit
responsible for managing the overall functioning of the processor, and optimization
techniques for reducing area, power, and delay constraints in digital circuits using FPGA
are discussed. The conclusion summarizes the design and implementation of the control
unit, showcasing its role in processor instruction flow and demonstrating results through
simulation windows, underscoring the significance of circuit optimization in the research
topic.

36.Vamsi Krishna Et. al. [37] addressed the critical issue of power dissipation and heat
generation in modern computer chips by proposing a 32-Bit Arithmetic Logic Unit (ALU)
built using reversible decoder-controlled combinational circuits. Reversible logic,
inspired by zero-energy computation, offers a solution to reduce power consumption and
has diverse applications in low power CMOS design, Quantum & Optical computing,
PAGE \* MERGEFORMAT 2
Nano-Technology, and DSP. The designed ALU, implemented on Spartan3E (XC3S500E-
FG320-5) FPGA, demonstrates 1.6 times less dynamic power consumption compared to
conventional designs. Moreover, it occupies only 3% of the total memory in the FPGA,
resulting in a 91% reduction in area when compared to traditional designs. The Verilog-
modeled architecture in Xilinx ISE Design Suite 14.3 showcases the potential of
reversible logic in achieving more efficient and compact computational units for various
applications. The conclusion suggests that future designs can further optimize area,
power, and delay for superior performance.

37.Rakhi Nangia Et. al. [38] introduced an innovative approach to designing Arithmetic
Logic Units (ALUs) by breaking down the regular pattern of ALU into identical stages
connected in a cascade through a carry chain. The design is implemented and tested for
various bit widths, including 4, 8, 16, 32, and 64 bits. Resource and functionality sharing
techniques are employed to achieve significant hardware savings, with a focus on
utilizing a single resource (parallel adder) for different functionalities through a control
circuit. The design is implemented in a 3s700anfgg484-4 FPGA, demonstrating a
hardware saving of 66% for 4-bit ALU, 65% for 8 and 16 bits, and 60% for 32 and 64 bits
when compared to a normal function-by-function design. The conclusion emphasizes the
flexibility of the bit slice implementation, allowing the creation of variable-width ALUs
and cascading multiple slices for broader applications while achieving substantial
hardware savings.

38.Sakshi Samaiya Et. al. [39] designed, implemented, and characterized a low-cost FPGA-
based bioimpedance measurement system. The system incorporates signal generation and
processing circuits within the FPGA ALU, along with the NIOS II embedded processor.
Additionally, it includes an Analog-to-Digital conversion board and a front-end
previously designed by the instrumentation and biomedical engineering group. The
communication mechanism between the FPGA ALU and the computer is also designed
and implemented using VHDL. Performance analysis reveals a maximum frequency of 87
MHz, and the proposed method demonstrates improved security and frequency levels in
FPGA implementation. Despite the expected functionality during the characterization
stage, some degradation in results is observed due to spurious effects from the front-end
and transformers on the THDB-ADA, indicating a need for further refinement.

39.Nuray Saglam Bedir Et. al. [40] designed and implemented a 64-bit Arithmetic Logic
Unit (ALU), a crucial component in the Central Processing Unit (CPU) core for
performing various arithmetic, logical, and shift-rotate operations. Using Very High-
Speed Integrated Circuits Hardware Description Language (VHDL) and Altera Field
Programmable Gate Arrays (FPGA), the ALU is synthesized and simulated with Altera
Quartus II and Modelsim-Altera software. Notably, the design accommodates the
PAGE \* MERGEFORMAT 2
processing of signed numbers, utilizing a Conditional Sum Adder (COSA) in the addition
operation for faster performance and reduced propagation delay. The design is successful
in handling overflow, undefined situations, and errors, showing efficiency in all
operations regardless of number sign. The use of VHDL's "component" structure aids in
managing complexity, improving readability, and ensuring compatibility with other
designs, albeit with an extended simulation time. The proposed design outperforms
previous studies, achieving a clock rate of 50 MHz with a COSA adder and demonstrating
potential for further clock frequency enhancement using FPGA's PLL structure.

40.Ankit Trivedi Et. al. [41] explored the significance of roughly computing, a popular
paradigm in the Internet of Things and big data era, particularly leveraging error-tolerant
features for resource-efficient computations in applications like machine learning and
signal processing. The focus is on the Floating Point Unit (FPU), a crucial component in
scientific computation and signal processing, with an emphasis on the IEEE 754 standard
for 32-bit operand arithmetic operations. The study delves into the multiplier's impact on
system performance and introduces a floating-point unit designed for multiplication,
addition, and subtraction functions. The paper emphasizes the importance of a power-
efficient 32-bit single-precision FPU based on the IEEE-754 standard for reduced
hardware requirements, lower power consumption, and minimized delays. Future
directions may include the exploration of Vedic mathematics for further improvements.

41.Jitesh Shinde Et. al. [42] introduced a step-by-step optimization approach for the
Arithmetic Logic Unit (ALU) at the logic circuit level, emphasizing resource sharing and
optimized arithmetic expressions. The work utilizes tools like Deeds Digital Circuit
Simulator and Aldec’s Active HDL to teach digital circuit design concepts effectively.
The empirical measurements compare the VLSI implementation of three entities –
'alugen,' 'alugenopt,' and 'aluopt,' showcasing the impact of resource sharing and
optimized expressions on area requirement, delay, and power in ALU. The study reveals
that the approach suggested in the paper leads to better savings in area utilization, delay,
and power during the VLSI frontend design. Additionally, the paper suggests the potential
for further optimizations in full adder blocks and hints at the benefits of implementing the
ALU design approach at the VLSI backend level.

42.Sai Lakshmi Et. al. [43] presented a comparative study of various adders, including
MUX-based adders, pass transistor adders, and 2-T logic-based adders, emphasizing their
design and performance in terms of power consumption and dead time. The proposed
circuit outperforms existing methods in terms of area and delay, crucial factors in VLSI
design complexity. The focus is on 32-bit adders, such as Ripple Carry Adder (RCA),
Carry Increment Adder (CINA), and Carry Bypass Adder (CBYA), implemented using
Verilog HDL in Xilinx 14.5 ISE for the Spartan 3E family device. The results highlight
PAGE \* MERGEFORMAT 2
the trade-offs between LUTs, slices, fan-out, and delay for each adder type, indicating
that the modified full adder contributes to reduced delay and improved performance.

The study suggests potential extensions to evaluate and compare adders of different sizes
and types, such as carry-save adders and carry-skip adders.

43.Tanaji Dudhane Et. al. [44] focussed on enhancing the relevance of 8-bit CPU
architecture in the era of 64-bit systems. It introduces a Co-operative Arithmetic Logic
Unit (ALU) that collaborates with the existing ALU to perform 16-bit operations without
compromising the integrity of the original 8-bit architecture. The Co-operative ALU is
integrated into a 2-stage pipelined 8-bit Reduced Instruction Set Computing (RISC)
architecture, serving as an extension to provide new functionalities. The design includes
specially crafted instructions for the Co-operative ALU, ensuring seamless integration
with the original PIC RISC architecture. Performance analysis, conducted using Xilinx
platform tools, demonstrates a reduction in total execution time and power consumption
when compared to the original PIC RISC architecture.

44.Ganeswar Sahu Et. al. [45] addressed the significant power dissipation in general-
purpose CPUs, particularly in the Arithmetic Logic Unit (ALU), a crucial component
responsible for arithmetic and logic operations. To mitigate power consumption, the
design proposes an ALU with a latch-free clock gating technique, aiming to reduce clock
and dynamic power by deactivating unused areas during specific operations. The study
also explores the use of a Carry Select Adder (CSLA) to enhance computational
efficiency. A gate-level modification is introduced to the CSLA structure, resulting in a
16-bit Square Root CSLA (SQRT CSLA) architecture with reduced area and power
consumption compared to the regular SQRT CSLA, albeit with a slight increase in delay.
The proposed low-power 16-bit ALU is implemented on Xilinx Spartan 3E FPGA,
demonstrating reduced power consumption compared to conventional ALU designs.

45.Hariom Kumar Et. al. [46] introduced a floating-point addition and subtraction
algorithm along with their pipeline design for IEEE single-precision floating-point
numbers. The complexity of these algorithms makes their implementation challenging on
FPGAs, particularly for scientific applications requiring high accuracy. The study
explores the trade-off between area and speed to enhance accuracy in the results. Bit-
parallel adders are employed for both addition and subtraction, implemented in VHDL
language and compiled using the Xilinx ISE compiler for FPGA kits. The pipelined
design of the floating-point adder and subtractor units improves performance, enabling
the execution of multiple instructions simultaneously. The results indicate efficient trade-
offs between area and speed for enhanced accuracy in IEEE single-precision floating-
point arithmetic using Xilinx ISE.
PAGE \* MERGEFORMAT 2
46.T. Ravi Et. al. [47] focussed on integrating the computing capabilities of an 8-bit
processor with a 16-bit cooperative arithmetic and logic unit (CALU) and an 8x8 bit
multiplier.

The objective is to enhance the processor's functionality while maintaining a simple

architecture and low power consumption. The implementation involves the development
of a 16F84 RISC processor module using HDL language and FPGA technology. Results
indicate significant cycle savings, with a 76.09% reduction in execution time achieved
through the integration of the 16-bit CALU and 8x8 bit multiplier. The designed
processor is configured on Xilinx 14.6 using FPGA and implemented using Verilog or
HDL. The 8-bit processor, with modifications, proves efficient for various applications,
offering low power consumption and improved execution times on FPGA.

47.Vatsala Sharma Et. al. [48] proposed the design and implementation of an optimized 64-
bit Arithmetic Logic Unit (ALU) for processors, with the flexibility to reduce its size to
16-bit or 32-bit. The optimization process involves a two-level approach, initially
decreasing FPGA resource utilization through recycling and reusing resources for various
operations. The final optimization stage focuses on making only one block active at a
time, reducing dynamic power consumption significantly. The simulation results and
power reports indicate that the model works correctly and achieves energy efficiency.
Further enhancements involve using tri-state logic to disable unnecessary blocks and
removing certain functions to reduce FPGA resource usage and power consumption.

48.Muhammad Ikmal Mohd Taib Et. al. [49] focussed on designing a 16-bit Arithmetic
Logic Unit (ALU) using VHDL, a crucial component in embedded systems for various
applications like cell phones and computers. The ALU performs essential arithmetic and
logical operations, including addition, subtraction, and logical AND/OR. The project
specifically targets the implementation of 16-bit division and multiplication operations
using Altera Quartus II software. Simulation results demonstrate the successful
performance of the proposed ALU design in carrying out these operations. Overall, this
project showcases the versatility and functionality of ALUs in computing devices.

49.Suhas Shirol Et. al. [50] introduced a novel approach to Arithmetic Logic Unit (ALU)
design, emphasizing the importance of low-power Very Large-Scale Integration (VLSI).
The proposed ALU incorporates both reversible and irreversible logic gates, creating a
hybrid design aimed at minimizing power dissipation and delay. To enhance the digital
adder's efficiency, the design utilizes carry select adder (CSA) and Kogge–Stone adder
(KSA) to reduce carry propagation time, while also employing a binary-to-excess-one
converter (BEC) instead of the conventional ripple carry adder (RCA) to decrease area.
PAGE \* MERGEFORMAT 2
The application of this adder design is extended to Vedic multipliers, further optimizing
delay in digital multiplication. The entire design is implemented on Spartan 6 FPGA
using Verilog coding and Chip Scope Pro for validation.

50.Sateesh Kourav Et. al. [51] presented the design and implementation of a 64-bit
Arithmetic Logic Unit (ALU) using VHDL, simulated on a Xilinx simulator. The ALU is
highlighted as a fundamental building block of a processor, performing various logical,
arithmetic, and shifting operations. The proposed design focuses on applications in areas
such as automobile and control systems. The ALU's significance in the Central
Processing Unit (CPU) is emphasized, addressing arithmetical operations, logical
operations, and shift-rotate operations. The paper discusses the challenges of power
consumption in complex ALUs and introduces a new ALU architecture to enhance
dynamic, on-the-fly operation support. It also explores the synthesis of fixed-point
arithmetic using VHDL and FPGA implementation, emphasizing the handling of
increasing challenges in modern processors. The research outlines a 64-bit
microprocessor without Reduced Instruction Set Computing (RISC) based on interlocked
pipeline stages, specifically MIPS, and examines the VHDL design process. The ALU
design is studied, incorporating various mathematical operations and demonstrating their
implementation using simple gates. The project, implemented using VHDL and simulated
with Xilinx 9.2i ISE, showcases the effectiveness of the proposed design and its potential
applications.

51.Avinash Gour Et. al. [52] introduced a reconfigurable adaptive fault-tolerant system for
a 32-bit Arithmetic Logic Unit (ALU) in the electronic industry, emphasizing the
importance of reducing testing costs to benefit large-scale VLSI circuit production. The
implemented ALU focuses on low power consumption, area efficiency, and minimal
delay. Key challenges in fault-tolerant system design, including fault detection and
correction during operation, are addressed through the incorporation of online checkers
and redundancy concepts. The paper details the utilization of various hardware
redundancy techniques, along with partial dynamic reconfiguration, in the design of the
32-bit ALU. The achieved fault coverage is reported to be 100%, highlighting the
effectiveness of the proposed fault-tolerant system.

52.Sateesh Kourav Et. al. [53] focussed on the analysis of a 64-cycle Arithmetic Logic Unit
(ALU) emphasizing low power consumption and high velocity. The Carry Look Ahead
Technique is employed using the VHDL language to enhance speed and reduce the area
of operation constraints. The proposed ALU design incorporates arithmetic and logic
operations, such as Addition, Subtraction, Multiplication, Increment, Decrement, Logical
AND, Logical OR, Logical XOR, etc. The design is aimed at efficient execution of
mathematical, logical, and shifting tasks within a computer. The modules of the ALU are
efficiently designed using Xilinx software, and simulation results are verified on a single
platform through a test bench, demonstrating improved system performance and reduced
power consumption.

PAGE \* MERGEFORMAT 2
53.Yamini Divya Et. al. [54] designed and implemented of various 32-bit adders, including
Ripple Carry Adder (RCA), Carry Increment Adder (CIA), and Carry Skip (or) Carry
Bypass Adder (CSKA), using Verilog HDL. The significance of 32-bit adder design is
emphasized due to its common usage in digital systems and processors. The work
presents results obtained through Verilog code execution in Xilinx 14.5 ISE for the
Spartan 3E family device. The synthesis results indicate that the LUTs are less for RCA,
and CSKA with FA1 exhibits a 36% reduction in delay compared to RCA with FA,
providing the least delay. The paper suggests the potential extension of this work to
different-sized adders and other 32-bit adder designs like carry save adder, carry look
ahead adder, and carry select adder.

54.Yadav Ranjeeta Et. al. [55] enhanced the functionality of an FIR Filter by modifying its
internal components, specifically using an ALU-based algorithm with a focus on Adder
and Multiplier blocks. The design is targeted at optimizing area parameters while
evaluating static and dynamic power consumption. The implementation involves writing
the programming language in VERILOG and utilizing Xilinx ISE suite for simulation and
design realization. The FIR Filter is designed with 16 input samples and 16 coefficients
generated through MATLAB. The conclusion highlights the success in achieving reduced
area consumption and the importance of understanding programming languages like
VERILOG, MATLAB, and the user-friendly Xilinx ISE suite for digital circuit design and
improvement.

55.Tevhit Karacali Et. al. [56] designed an Arithmetic Logic Unit (ALU) using FPGA
capabilities, particularly the Xilinx Zynq-7000 integrated circuit, which offers low power
consumption and parallel processing. The ALU is implemented to handle IEEE754 32-bit
floating-point numbers, accommodating fractional, large, and negative values. VHDL is
employed as the programming language for the ALU design. The study extends to
programming the dual-core ARM Cortex-A9 processor within the Zynq-7000 chip to
command and control the designed ALU. A system is developed to facilitate
communication with a PC via UART using the Zedboard Development Kit. The study
emphasizes the applicability of the designed ALU in various applications and its
compatibility with IEEE754 floating-point number format for high-precision calculations.

56.Jallu Swathi Et. al. [57] designed a 32-bit Arithmetic and Logic Unit (ALU), the
fundamental component of a computer's central processing unit (CPU). The ALU is
implemented to perform 16-bit addition, 16-bit multiplication, and two logical operations,
utilizing components such as a 16-bit carry skip adder for addition, Booth's Algorithm for
multiplication, and reversible logic gates for logical operations. The design is described
using a high-level hardware description language and synthesized with Xilinx ISE 14.7.
The study concludes by extending the behaviorally modeled 4-bit ALU to a structurally
modeled n-bit ALU, highlighting improved flexibility and reusability. The performance
analysis demonstrates enhanced efficiency, suggesting potential use for designing
multiple-bit ALUs with parallel computation.

PAGE \* MERGEFORMAT 2
57.Chandrashekhar Patel Et. al. [58] designed a 4-bit Arithmetic and Logic Unit (ALU)
for a RISC processor, capable of performing 16 arithmetic and logical operations. The
design is implemented using Vivado simulation tools and the SP701 Spartan FPGA
board. The study aims to enhance energy efficiency in FPGA-based ALUs, utilizing
energy-efficient IO standard approaches. The findings include a new method for building
energy-efficient ALUs, with a focus on power usage at pre- and post-levels. The research
concludes by emphasizing the potential benefits of the proposed ALU design for
emerging technologies like the Internet of Things, supporting environmental initiatives.

58.Jagadeeswar Reddy Et. al. [59] introduced the design and implementation of a 32-bit
Arithmetic Logic Unit (ALU) using Verilog HDL, with a focus on reducing power
consumption through reversible logic gates. The ALU is a critical component in various
systems, including calculators, cell phones, and computers. The design incorporates
logical gates such as AND and OR for each one-bit ALU circuit. The implementation in
Xilinx demonstrates improved speed and lower power consumption compared to
traditional ALU processors. The study emphasizes the significance of reversible logic in
reducing power dissipation and presents a comparison between ALUs based on reversible
and irreversible logic gates. The designed ALU achieves notable improvements in power
inflation, delay reduction, and chip area utilization over previous works.

59.Anuradha Et. al. [60] addressed the increasing stress on processors due to complex
tasks, leading to a rise in processing cores. To alleviate this stress, coprocessors are
assigned to specific tasks, such as signal processing. The response time of the Arithmetic
Logic Unit (ALU) depends on the speed of the multiplier, making it a crucial part of the
CPU. Vedic mathematics, specifically the Urdhva Tiriyagbhyam and Nikhilam
algorithms, is employed for quick multiplication operations, reducing space, power, and
delay in processors. The project utilizes Verilog HDL to design and specify these
multipliers, and Xilinx ISE Project Navigator for synthesis and simulation, aiming to
enhance operating speed and implement a fundamental approach for various applications
and cryptography. The proposed Vedic mathematics-based Nikhilam architecture for
binary number system multiplication is designed alongside existing Wallace Tree and
Urdhva Tiriyagbhyam structures, demonstrating versatility in architectures through
Verilog HDL and Xilinx ISE Tool.

60.Girija Sanjeevaiah Et. al. [61] introduced an efficient 32-bit multi-functional reversible
arithmetic and logical unit (MF-RALU) designed using reversible logic gates to address
power dissipation concerns in modern computer applications. The MF-RALU performs
30 operations, including advanced additions and multiplications, utilizing multi-bit
reversible multiplexers. A Reduced Instruction Set Computer (RISC) processor is
developed to validate the functionality of the MF-RALU, operating within a single clock
cycle. The 1-bit RALU, serving as the basic building block, is compared with existing
approaches, showcasing improved performance metrics. Synthesized and implemented on
Artix-7 FPGA using Verilog-HDL in the Vivado environment, the MF-RALU occupies

PAGE \* MERGEFORMAT 2
<11% chip area and consumes 332 mW total power, while the RISC processor utilizes
<3% chip area, operates at 483 MHz frequency, and consumes 159 mW total power. The
work suggests possibilities for extension to higher-bit architectures and additional
operations.

61.Stefano Di Matteo Et. al. [62] introduced a hardware accelerator designed for the SEAL-
Embedded library, catering to resource-constrained platforms and specifically tailored for
embedded devices. The accelerator features a configurable Number Theoretic Transform
(NTT) unit for various polynomial degrees, an optimized memory architecture to reduce
I/O latency, and a dedicated module for generating roots of unity. Implemented on a
Xilinx ZCU106 FPGA board with a 32-bit RISC-V (RI5CY) processor, the hardware
accelerator demonstrated a significant speed-up of approximately x1000 compared to the
pure software implementation of the SEAL-Embedded library for the symmetric
encryption function. The results indicate the potential for substantial performance
improvements in privacy-preserving computations on edge devices.

62.Bijan Vosoughi Vahdat Sharif Et.al. [63,5] presented a hardware implementation of a

32-bit Fault-tolerant ALU using BCH code on Spartan-3 FPGA, demonstrating the lowest
hardware overhead and the ability to correct any 5-bit error. Comparative analysis with
other fault-tolerant methods, such as Residue code and Triple Modular Redundancy,
reveals a 75% reduction in hardware overhead with superior fault coverage. The proposed
system is validated through simulation on Modelsim 6.2b and performance verification
on ISE 8.2i, paving the way for future research on time redundancy in 32-bit ALU fault-
tolerant methods.

63.M.K.Soni Et.al.[64,9] discussed about the fixed-point algorithms due to resource

efficiency, but rising system demands and FPGA capabilities now drive a shift towards
floating-point implementations. The latest FPGA technology enables cost-effective and
flexible development of 32-bit floating-point arithmetic units, adhering to IEEE 754
standards, validated through VHDL code on Xilinx and further verified using Simulink
models in MATLAB.

64.Lalan Kumar Mishra Et. Al [65,7] designed and synthesised a 32-bit Arithmetic Logic
Unit (ALU) using VHDL in Xilinx ISE 9.1i, targeted for Spartan devices. The ALU is
capable of performing various arithmetic and logical operations, including addition,
subtraction, overflow detection, AND, OR, XOR, XNOR, NOT operations, parity check,
1’s and 2’s complement, and comparison. The inclusion of novel features such as flags
(Zero, Carry, Odd Parity), a Zero Counter, and additional operations makes it a valuable
contribution to FPGA-based technology.

65.Bishwajeet Pandey Et. Al. [66,14] designed a 32-bit ALU on a 40nm Virtex-6 FPGA,
optimized for Wi-Fi ah channel at 0.9 GHz, anticipated in 2016. Utilizing voltage and
capacitance scaling, the study examines power dissipation variations, operating the ALU
at different voltages with capacitance adjustments. Verilog is employed as the Hardware
PAGE \* MERGEFORMAT 2
Description Language, and XPower Analyser and Xilinx ISE Design Suite 14.2 are used
for power calculation and simulation, respectively.

66. Veeresh Pujari Et. al. [67,23] designed a 32-bit ALU VLSI architecture with
considerations for logic delay, wattage, and chip space. It explores various adder
configurations to meet execution standards, ultimately utilizing mixed logic techniques
such as CMOS, pseudo NMOS, and pass transistor for organizing digital functions. The
finalized ALU architecture is simulated in HDL and implemented on FPGA Spartan 3E
kits for real-time realization.

67.Abdullah Buzdar Et. al. [68,31] implemented of a basic Arithmetic Logic Unit (ALU) in
VHDL using two different adder circuits: a ripple carry adder and a sklansky type adder.
The ALU is designed on a 130nm CMOS platform using ASIC technology from ST
Microelectronics, with Cadence EDA tools employed for implementation. The
comparative analysis focuses on area, power, and timing requirements of the two ALU
circuits.

68.Mirosław Chmiel Et. al. [69,16] introduced an Arithmetic and Logic Unit (ALU) for a
prototype Programmable Logic Controller (PLC) implemented in an FPGA device. The
ALU supports 32 operations, encompassing basic logic, comparators, and fundamental
arithmetic operations for both fixed-point and floating-point numbers. The hardware-
based implementation ensures high-speed performance, and the synthesizable Verilog and
VHDL models are easily adaptable to other FPGA architectures or ASICs.

70.Sunil Shah Et. al. [70,51] designed a 64-bit Arithmetic Logic Unit (ALU) in VHDL for
processors, particularly beneficial in automotive control systems, handles arithmetic, logical,
and shifting operations. The proposed ALU architecture ensures dynamic on-the-fly support
for precise operations, addressing the trade-offs of complexity, cost, space, and power
consumption. The VHDL-synthesized fixed-point arithmetic core, validated through
simulations and FPGA implementation, enhances computational precision without
compromising system performance.

PAGE \* MERGEFORMAT 2
3.EXPLORING ALU ARCHITECTURES-BITWISE INSIGHTS

3.1-Functional Blocks of a Microprocessor

The microprocessor is a programmable IC that is capable of performing arithmetic and
logical operations. The basic functional blocks of microprocessor are ALU, flag register,
register array, Program Counter (PC)/Instruction Pointer (IP), instruction decoding unit,
timing, and control unit. The basic functional block diagram of a microprocessor is shown
below figure:

ALU is the computational unit of the microprocessor which performs arithmetic and logical
operations on binary data. The various conditions of the result are stored as status bits called
flags in the flag register. For example, consider a sign flag, one of the bit positions of the flag
register is called the sign flag and it is used to store the status of a sign of the result of the
ALU operation (output data of ALU). If the result is negative, then “1” is stored in the sign
flag and if the result is positive, then “0” is stored in the sign flag.

The Register array is the internal storage device and so it is also called Internal memory. The
input data for ALU, the output data of ALU (a result of computations), and any other binary
information needed for processing are stored in the register array. For any microprocessor,
there will be a set of instructions given by the manufacturer of the microprocessor. For doing

PAGE \* MERGEFORMAT 2
any useful work with the microprocessor, we have to write a program using these
instructions and store them in a memory device external to the microprocessor.

The program counter generates the address of the instructions to be fetched from the memory
and send through the address bus to the memory. The memory will send the instruction
codes and data through the data bus. The instruction codes are decoded by the decoding unit
and sent information to the timing and control unit. The data is stored in the register array for
processing by ALU. The control unit will generate the necessary control signals for the
internal and external operations of the microprocessor.

3.2-Organization of Arithmetic and Logical Unit

 The arithmetic and logic unit is an 8-bit unit.

 It performs arithmetic, logic and rotates operations.
 It consists of the binary adder to perform addition and subtraction by 2's complement
method.
 The result is typically stored in an accumulator.
 Accumulator, temporary register and flag register are closely associated with A.L.U.
 The temporary register is used to hold data during an arithmetic/ logic operation.
PAGE \* MERGEFORMAT 2
 The flags are set or reset according to the result of operations in the status register.

3.3-Introduction to N-bit ALU

Arithmetic and Logic Units (or ALUs) are found at the core of microprocessors, where they
implement the arithmetic and logic functions offered by the processor (e.g., addition,
subtraction, AND’ two values). An ALU is a combinational circuit that combines many
common logic circuits in one block. Typically, ALU inputs are comprised of two N-bit
busses, a carry-in, and M select lines that select between the 2^M.2M ALU operations. ALU
outputs include an N-bit bus for function output and a carry out.

Fig. Basic Symbol of N-bit ALU

ALUs can be designed to perform a variety of different arithmetic and logic functions.
Possible arithmetic functions include addition, subtraction, multiplication, comparison,
increment, decrement, shift, and rotate; possible logic functions include AND, OR, XOR,
XNOR, INV, CLR (for clear), and PASS (for passing a value unchanged). All of these
functions find use in computing systems, although a complete description of their use is
beyond the scope of this document. An ALU could be designed to include all of these
PAGE \* MERGEFORMAT 2
functions, or a subset could be chosen to meet the specific needs of a given application.
Either way, the design process is similar (but simpler for an ALU with fewer functions).

As an example, we will consider the design of an ALU that can perform one of eight
functions on 8-bit data values. This design, although relatively simple, is not unlike many of
ALUs that have been designed over the years for all sizes and performance ranges of
processors. Our ALU will feature two 8-bit data inputs, an 8-bit data output, a carry-in and a
carry out, and three function select inputs (S2, S1, S0) providing selection between eight
operations (three arithmetic, four logic and a clear or ‘0’).
Targeted ALU operations are shown in the operation table below. The three control bits used
to select the ALU operation are called the operation code (or op code), because if this ALU
were used in an actual microprocessor, these bits would come from the opcodes (or machine
codes) that form the actual low-level computer programming code. (Computer software
today is typically written in a high-level language like ‘C’, which is compiled into assembler
code. Assembler code can be directly translated into machine codes that cause the
microprocessor to perform particular functions).

ALU design should follow the same process as other bit-slice designs: first, define and
understand all inputs and outputs of a bit slice (i.e., prepare a detailed block diagram of the
bit slice); second, capture the required logical relationships in some formal method (e.g., a
truth table); third, find minimal circuits (by using K-maps or espresso) or write VHDL code;
and fourth, proceed with circuit design and verification.

3.4- 4 BIT ARITHMETIC AND LOGICAL UNIT

A sketch of top-level design will be derived based on that input/output characterization as

well as other requirements which are relevant to this stage as stated by the project
specifications. From the top-level, the design will be a circuit with three sets of inputs and
two sets of outputs.

PAGE \* MERGEFORMAT 2
Fig. Top-Level Design
As shown in the figure, the circuit accepts three 4-bit input vectors A, B, and S. More
specifically, A and B are 4-bit data input vectors and S is a 4-bit command input vector.
At the output end, the circuit will generate a 4-bit data result as well as two 1-bit status V
C A3-A0 R3-R0 B3-B0 4-bit ALU S3-S0 Figure 1.1. Top-Level Design 8 signals C and V
representing the Carry bit and Overflow bit, respectively for the operations. The
relationship between A, B, S, R, C, and V can be more specifically listed in the following
table.

Hex. code S0 S1 S2 S3 R Comments C V

0 0 0 0 0 A A is outputted 0 0

1 0 0 0 1 B Shifted 1 bit left circular shift 0 0

2 0 0 1 0 A-B Complement-Two Subtraction 0/1 0/1

3 0 0 1 1 B B is outputted 0 0

4 0 1 0 0 A+1 Incremented A 0/1 0/1

5 0 1 0 1 A-1 Decremented A 0/1 0/1

6 0 1 1 0 B+1 Incremented B 0/1 0/1

7 0 1 1 1 A+B Add A and B 0/1 0/1

8 1 0 0 0 B-1 Decremented B 0/1 0/1

9 1 0 0 1 A OR B bit by bit OR 0 0

A 1 0 1 0 A Shifted 1 bit right circular shift 0 0

B 1 0 1 1 A AND B bit by bit AND 0 0

C 1 1 0 0 A XOR B bit by bit XOR 0 0

D 1 1 0 1 Max (A, B) Select the maximum 0 0

E 1 1 1 0 -B Complement-Two Negation 0 0

F 1 1 1 1 -A Complement-Two Negation 0 0

Table- Input/Output Relationship

PAGE \* MERGEFORMAT 2
According to the input/output relationship listed in table 1.1, we can specify further the
way we interpret the meaning of the data input vectors A and B, and the data output
vector R respectively so that the corresponding operation is meaningful. This is
straightforward. For those operations other than addition, subtraction, increment, and
decrement (including Max (A,B)), A and B are considered as two sets of bits. For the
rests, A, B, and R are 4-bit signed numbers. In such circumstances, their most significant
bits (A3, B3, and R3) are sign bits.
Also note that we will use complement-two representations for negative numbers or in
cases that involve subtraction operation. One more thing important enough to be
mentioned is that we can’t assume each bit of our data inputs and command inputs which
are meaningful to the supposed operation will be available to the ALU simultaneously.
Hence, it is appropriate to implement some kind of circuit that can synchronize the inputs;
and make sure that our circuit is operating on the 9 correct data. At the output end, we
will do the same thing so that our output will always contain meaningful data.

Fig. Block Diagram of 4-bit ALU

The Arithmetic Unit:

PAGE \* MERGEFORMAT 2
figure 1. It has four full adder
circuits that constitute the 4 bit
adder and 4
multiplexers for choosing
multiple operations. There are
two 4 bit inputs A and
B and 4 bit output D.
The arithmetic operations in
the table can be implemented
in one composite
arithmetic circuit. The basic
component of an arithmetic
circuit is a full adder.

PAGE \* MERGEFORMAT 2
The arithmetic operations in
the table can be implemented
in one composite
arithmetic circuit. The basic
component of an arithmetic
circuit is a full adder.
By controlling the data input
to the adder it is possible to
obtain different types
of arithmetic operations. The
diagram of the 4-bit arithmetic
circuit is shown in
figure 1. It has four full adder
circuits that constitute the 4 bit
adder and 4
PAGE \* MERGEFORMAT 2
multiplexers for choosing
multiple operations. There are
two 4 bit inputs A and
B and 4 bit output D.
The arithmetic operations in
the table can be implemented
in one composite
arithmetic circuit. The basic
component of an arithmetic
circuit is a full adder.
By controlling the data input
to the adder it is possible to
obtain different types

PAGE \* MERGEFORMAT 2
of arithmetic operations. The
diagram of the 4-bit arithmetic
circuit is shown in
figure 1. It has four full adder
circuits that constitute the 4 bit
adder and 4
multiplexers for choosing
multiple operations. There are
two 4 bit inputs A and
B and 4 bit output D.
The arithmetic operations in
the table can be implemented
in one composite

PAGE \* MERGEFORMAT 2
arithmetic circuit. The basic
component of an arithmetic
circuit is a full adder.
By controlling the data input
to the adder it is possible to
obtain different types
of arithmetic operations. The
diagram of the 4-bit arithmetic
circuit is shown in
figure 1. It has four full adder
circuits that constitute the 4 bit
adder and 4
multiplexers for choosing
multiple operations. There are
two 4 bit inputs A and
PAGE \* MERGEFORMAT 2
B and 4 bit output D.
The arithmetic operations in
the table can be implemented
in one composite
arithmetic circuit. The basic
component of an arithmetic
circuit is a full adder.
By controlling the data input
to the adder it is possible to
obtain different types
of arithmetic operations. The
diagram of the 4-bit arithmetic
circuit is shown in

PAGE \* MERGEFORMAT 2
By controlling the data input
to the adder it is possible to
obtain different types
of arithmetic operations. The
diagram of the 4-bit arithmetic
circuit is shown in
figure 1. It has four full adder
circuits that constitute the 4 bit
adder and 4
multiplexers for choosing
multiple operations. There are
two 4 bit inputs A and
B and 4 bit output D.
The arithmetic operations in the table can be implemented in one composite arithmetic
circuit. The basic component of an arithmetic circuit is a full adder. By controlling the data
input to the adder it is possible to obtain different types of arithmetic operations. The
diagram of the 4-bit arithmetic circuit is shown in figure 5. It has four full adder circuits that
constitute the 4 bit adder and 4 multiplexers for choosing multiple operations. There are two
4 bit inputs A and B and 4 bit output D.

PAGE \* MERGEFORMAT 2
Fig. Schematic of 4-bit Arithmetic Unit

Logic Unit:

Logic micro-operations specify binary operations for strings of bits stored in registers. These
operations consider each bit of registers separately and treat them as binary variables. Figure
shows one stage of a circuit that generates the four basic logic micro operations. It consists
of 4 gates and a multiplexer each of the four logic operations is generated through a gate that
performs the required logic. The outputs of the gates are applied to the data inputs of the
multiplexer. The two selection inputs S1 and S2 choose one of the data inputs of the
multiplexers and direct it values to the output. Figure shows one typical stage of logical unit.
A and B are the 4-bit word inputs ALU A3, A2, A1, A0 and B3, B2, B1, B0 are the bits. A3
and B3 are the MSBs. S2, S1, S0 are the selection inputs. S2 selects the arithmetic operation
for ‘0’ and logic operation for ‘1’. S1, S0 are used to select various operations in arithmetic

PAGE \* MERGEFORMAT 2
and logic blocks. Cin is the input carry to arithmetic circuit. f3, f2, f1, f0 are the output bits.
Cout is the output carry.

Fig. Schematic of 1-bit Logic Unit

3.5- 8 BIT ARITHMETIC AND LOGICAL UNIT

By connecting eight 1-bit ALUs together, we obtain an 8-bit ALU. A single decoder can be
used to control all the 1-bit ALUs. There is no need to replicate this decoder eight times.The
last carried value can be used to detect overflows when performing a binary addition on the
two Bytes of data A and B.

PAGE \* MERGEFORMAT 2
Fig. 8-bit ALU

An 8-bit arithmetic logic unit (ALU) is a combinational circuit which operates on two 8-bit
input buses based on selection inputs. The ALU performs common arithmetic (addition and
subtraction) and logic (AND, INV, XOR, and OR) functions. These operations are common
to all computer systems and thus are an essential part of computer architecture.

Fig. Symbol of 8-bit ALU

For the operation of the 8-bit ALU, it will perform arithmetic (decrement, addition,
subtraction, increment) and logical (logical AND, logical XOR, logical OR, logical XNOR)
operations that are shown in Table 1, in which the three select signals will responsible to
select the operation. For the Increment operation, the input will be the logic ‘1’ and is
performed by adding ‘1’ to the addend. Besides, a complement of B will be implemented in
the ALU for the Subtraction operation. The input will be the logic ‘0’ for the Decrement
operation, and the operation will be similar to the subtraction operation [8][9]. Since the 8-
bit ALU consists of arithmetic and logical operations, the first 4-to-1 multiplexer and the full
adder will be considered the arithmetic part. In contrast, the second 4-to-1 multiplexer and
the simple logic gates will be considered as the logical part. For the arithmetic part, the first
PAGE \* MERGEFORMAT 2
4-to-1 multiplexer will be responsible for selecting the input according to the condition of
the selection lines and sending it to the full adder to compute the result.

For the logical part, the second 4-to-1 multiplexer is responsible for selecting either OR,
AND, XNOR, or XOR operations according to the condition of the selection lines. Lastly,
the 2-to-1 multiplexer will decide whether to perform arithmetic or logical operations and
sends it out.

Table. Truth Table of 8-bit ALU

An 8-bit Arithmetic Logic Unit (ALU) is a crucial component within a computer's central
processing unit (CPU), responsible for executing arithmetic and logical operations on 8-bit
binary data. Its design involves the integration of various logic gates and components to
perform a range of operations efficiently.
The ALU's primary functions include addition and subtraction. Addition is accomplished
using multiple full adders, while subtraction is typically implemented through two's
complement representation. Logical operations such as AND, OR, XOR, and NOT are
realized using basic logic gates like AND gates, OR gates, XOR gates, and inverters.
Logical operations, such as AND, OR, XOR, and NOT, are executed using combinations of
these basic gates. The ALU's ability to perform these logical operations is crucial for tasks
such as bitwise manipulation and decision-making in programming and data processing.
Multiplexers play a vital role in ALU design, enabling the selection of specific operations
based on control lines. These control lines dictate whether the ALU should execute addition,
subtraction, or logical operations.

PAGE \* MERGEFORMAT 2
The ALU's output often includes flags that convey important information about the result.
Common flags include the zero flag (indicating a zero result), the carry flag (highlighting a
carry-out from the most significant bit), and the overflow flag.

Fig. Block-level diagram of the 8-bit arithmetic logic unit

Control lines are crucial for specifying the ALU's operation mode and managing various
aspects, such as flag settings or clearances. They provide a means to customize the ALU's
behaviour according to the specific requirements of the CPU.
In addition to its computational functions, an 8-bit ALU contributes to the overall processing
power of a computer system by efficiently handling data manipulation and decision-making
processes. Its compact size and capability to process 8 bits of data in parallel make it a
fundamental building block for a wide range of computing tasks.
In summary, the 8-bit ALU serves as a fundamental computational unit in a computer's CPU,
executing arithmetic and logical operations in parallel for 8-bit binary data. Its design

PAGE \* MERGEFORMAT 2
intricacies involve a combination of logic gates, multiplexers, and control lines to achieve
versatility and efficiency in processing a diverse set of instructions and data types.

3.6- 16 BIT ARITHMETIC AND LOGICAL UNIT

This ALU operate on 16-bit input. It performs arithmetic and logical operations. This gives
appropriate output. 16-bit ALU performs 16 operations. There are 2 data inputs and one
select line. According to select line input appropriate operation is performed between 2
inputs. Output of this 16bit ALU is connected between ROM and RAM. A number of basic
arithmetic and bitwise logic functions are performed in the ALU. ALU can be used in
complex operations, system processing and execution of any program.

Fig. Symbol of 16-bit ALU

A 16-bit Arithmetic Logic Unit (ALU) is a crucial component of a computer's central

processing unit (CPU) responsible for executing arithmetic and logical operations on 16-bit
binary data. Its design involves the integration of various logic gates and components to
perform a range of operations efficiently.
PAGE \* MERGEFORMAT 2
The ALU's primary functions include addition and subtraction. Addition is accomplished
using multiple full adders, while subtraction is typically implemented through two's
complement representation. Logical operations such as AND, OR, XOR, and NOT are
realized using basic logic gates like AND gates, OR gates, XOR gates, and inverters.
Multiplexers play a vital role in ALU design, enabling the selection of specific operations
based on control lines. These control lines dictate whether the ALU should execute addition,
subtraction, or logical operations.
The ALU's output often includes flags that convey important information about the result.
Common flags include the zero flag (indicating a zero result), the carry flag (highlighting a
carry-out from the most significant bit), and the overflow flag (signifying overflow in signed
operations).
Control lines are crucial for specifying the ALU's operation mode and managing various
aspects, such as flag settings or clearances. They provide a means to customize the ALU's
behavior according to the specific requirements of the CPU.

Fig. Design of 16-bit ALU

In addition to its computational functions, a 16-bit ALU contributes to the overall processing
power of a computer system by efficiently handling data manipulation and decision-making
processes. Its larger bit width allows for more extensive data processing capabilities, making
it suitable for tasks that require increased precision and range. The architecture of a 16-bit
ALU involves a network of interconnected gates, including AND gates, OR gates, XOR
gates, and inverters.
Full adders, composed of these basic gates, form the backbone of the ALU's ability to
perform addition. These circuits are replicated to handle each bit of the operands, allowing

PAGE \* MERGEFORMAT 2
parallel processing for 16-bit data. In subtraction operations, the two's complement
representation is employed, involving inverting all bits of the subtrahend and adding 1 to the
result.

Logical operations, such as AND, OR, XOR, and NOT, are executed using combinations of
these basic gates. The ALU's ability to perform these logical operations is crucial for tasks
such as bitwise manipulation and decision-making in programming and data processing.
The inclusion of multiplexers facilitates the selection of specific inputs and outputs, enabling
the ALU to perform various operations based on the control signals it receives. This
flexibility is essential for accommodating different instructions and operands in a CPU.
The incorporation of flags, like the zero, carry, and overflow flags, enhances the ALU's
functionality. These flags provide status information about the result of an operation,
enabling the CPU to make decisions based on the outcome.
The 16-bit ALU serves as a fundamental computational unit in a computer's CPU, executing
arithmetic and logical operations in parallel for 16-bit binary data.
A functional table for a 16-bit Arithmetic Logic Unit (ALU) outlines the various operations
the ALU can perform and describes the outputs based on different combinations of inputs. In
a functional table, each row corresponds to a unique combination of inputs, and the columns
represent the different signals or outputs.

Table- Functional Table for 16-bit ALU

PAGE \* MERGEFORMAT 2
This functional table illustrates the outcomes of various operations for an imaginary 16-bit
ALU. In a real-world scenario, the table would be more extensive, covering additional
operations and considering other factors like signed and unsigned interpretations of numbers.

PAGE \* MERGEFORMAT 2
3.7- 32 BIT ARITHMETIC AND LOGICAL UNIT

A 32-bit Arithmetic Logic Unit (ALU) is a critical component of a computer's central

processing unit (CPU), designed to perform arithmetic and logical operations on 32-bit
binary data. The design of a 32-bit ALU involves a network of interconnected logic gates and
components tailored to efficiently handle larger data sets.
The primary functions of the ALU encompass addition and subtraction operations. Addition
is achieved through multiple full adders, with each bit processed in parallel. Subtraction
typically employs two's complement representation. Logical operations such as AND, OR,
XOR, and NOT are executed using fundamental logic gates, including AND gates, OR gates,
XOR gates, and inverters.

Fig. Symbol of 32-bit ALU

Multiplexers play a pivotal role in the design, allowing the ALU to select specific operations
based on control lines. These control lines determine whether the ALU should execute
addition, subtraction, or logical operations, providing versatility in handling various
instructions.
Output from the ALU often includes flags conveying critical information about the result.
Common flags include the zero flag, indicating a zero result; the carry flag, highlighting a
carry-out from the most significant bit; and the overflow flag, signaling overflow in signed
operations.Control lines are instrumental in specifying the ALU's operation mode and
managing aspects such as flag settings or clearances.

PAGE \* MERGEFORMAT 2
The 32-bit ALU's larger bit width enhances its capability to process more extensive data sets
in parallel, contributing to the overall computational power of the CPU. The architecture
involves replicating circuits, such as full adders, to accommodate each bit of the
operands.The 32-bit ALU we will build will be a component in the Beta processor we will
address in subsequent laboratories. The logic symbol for our ALU is shown to the right. It is
a combinational circuit taking two 32-bit data words A and B as inputs, and producing a 32-
bit output Y by performing a specified arithmetic or logical function on the A and B inputs.
The particular function to be performed is specified by a 6-bit control input, FN, whose value
encodes the function according to the following table:

Table- Functional Table for 32-bit ALU

Note that by specifying an appropriate value for the 6-bit FN input, the ALU can perform a
variety of arithmetic operations, comparisons, shifts, and bitwise Boolean combinations
required by our Beta processor.

The bitwise Boolean operations are specified by FN[5:4]=10; in this case, the remaining FN
bits are taken as entries in the truth table describing how each bit of Y is determined by the
corresponding bits of A and B, as shown to the right. Logical operations, crucial for tasks
like bitwise manipulation and decision-making in programming, are realized through
combinations of basic logic gates.

PAGE \* MERGEFORMAT 2
The three compare operations each produce a Boolean output. In these cases, Y[31:1] are all
zero, and the low-order bit Y[0] is a 0 or 1 reflecting the outcome of the comparison between
the 32-bit A and B operands. We can approach the ALU design by breaking it down into
subsystems devoted to arithmetic, comparison, Boolean, and shift operations as shown
below:

Fig. Design of 32-bit ALU

Efficiency is a key consideration in the design of a 32-bit ALU, focusing on factors such as
speed optimization, power consumption minimization, and compatibility with the broader
CPU architecture. The ALU's intricate design ensures its effectiveness in handling a wide
range of computational tasks, making it an indispensable component within modern
computer systems.

PAGE \* MERGEFORMAT 2
4.INTRODUCTION TO VERILOG HDL
4.1- Verilog Introduction
Verilog is a Hardware Description Language (HDL). It is a language used for describing a
digital system such as a network switch, a microprocessor, a memory, or a flip-flop. We can
describe any digital hardware by using HDL at any level. Designs described in HDL are
independent of technology, very easy for designing and debugging, and are normally more
useful than schematics, particularly for large circuits.
It is most commonly used in the design and verification of digital circuits at the register-
transfer level of Abstraction. It is also used in the verification of analog circuits and mixed-
signal circuits as well as in the design of genetic circuits. In 2009, the Verilog standard (IEEE
1364-2005) was merged into the System Verilog standard, creating IEEE Standard 1800-
2009. Since then, Verilog is officially part of the System Verilog language. The current
version is IEEE standard 1800-2017.
Hardware description languages such as Verilog are similar to software Programming
languages because they include ways of describing the propagation time and signal strengths
(sensitivity). There are two types of assignment operators; a blocking assignment (=), and a
non-blocking (<=) assignment. The non-blocking assignment allows designers to describe a
state-machine update without needing to declare and use temporary storage variables. Since
these concepts are part of Verilog's language semantics, designers could quickly write
descriptions of large circuits in a relatively compact and concise form. At the time of
Verilog's introduction (1984), Verilog represented a tremendous productivity improvement
for circuit designers who were already using graphical schematic capture software and
specially written software programs to document and simulate electronic circuits.

The designers of Verilog wanted a language with syntax similar to the C programming
language, which was already widely used in engineering software development. Like C,
Verilog is case-sensitive and has a basic preprocessor (though less sophisticated than that of
ANSI C/C++). Its control flow keywords (if/else, for, while, case, etc.) are equivalent, and
its operator precedence is compatible with C. Syntactic differences include: required bit-
widths for variable declarations, demarcation of procedural blocks (Verilog uses begin/end
instead of curly braces {}), and many other minor differences. Verilog requires that variables
be given a definite size. In C these sizes are inferred from the 'type' of the variable (for
instance an integer type may be 32 bits).

A Verilog design consists of a hierarchy of modules. Modules encapsulate design hierarchy,

and communicate with other modules through a set of declared input, output,
and bidirectional ports. Internally, a module can contain any combination of the following:
net/variable declarations (wire, reg, integer, etc.), concurrent and sequential statement
blocks, and instances of other modules (sub-hierarchies).

PAGE \* MERGEFORMAT 2
Sequential statements are placed inside a begin/end block and executed in sequential order
within the block. However, the blocks themselves are executed concurrently, making Verilog
a dataflow language.

Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating, undefined")
and signal strengths (strong, weak, etc.). This system allows abstract modelling of shared
signal lines, where multiple sources drive a common net. When a wire has multiple drivers,
the wire's (readable) value is resolved by a function of the source drivers and their strengths.

A subset of statements in the Verilog language are synthesizable. Verilog modules that
conform to a synthesizable coding style, known as RTL (register-transfer level), can be
physically realized by synthesis software. Synthesis software algorithmically transforms the
(abstract) Verilog source into a netlist, a logically equivalent description consisting only of
elementary logic primitives (AND, OR, NOT, flip-flops, etc.) that are available in a
specific FPGA or VLSI technology. Further manipulations to the netlist ultimately lead to a
circuit fabrication blueprint.

Verilog was developed to simplify the process and make the HDL more robust and flexible.
Today, Verilog is the most popular HDL used and practiced throughout the semiconductor
industry.

HDL was developed to enhance the design process by allowing engineers to describe the
desired hardware's functionality and let automation tools convert that behaviour into
actual hardware elements like combinational gates and sequential

Verilog is like any other hardware description language. It permits the designers to design
the designs in either Bottom-up or Top-down methodology.

Bottom-Up Design:

The traditional method of electronic design is bottom-up. Each design is performed at the
gate-level using the standards gates. This design gives a way to design new structural,
hierarchical design methods. Without these new design practices, it would be impossible to
handle the new complexity.

Top-Down Design:

The desired design-style of all designers is the top-down design. A real top-down design
allows early testing, easy change of different technologies, a structured system design and
offers many other advantages. But it is very difficult to follow a pure top-down design. Due
to this fact most designs are mix of both the methods, implementing some key elements of
both design styles. Verilog was developed to simplify the process and make the HDL more
robust and flexible. Today, Verilog is the most popular HDL used and practiced throughout
the semiconductor industry.

PAGE \* MERGEFORMAT 2
4.2-History of Verilog
Beginning
Verilog was created by Prabhu Goel, Phil Moorby and Chi-Lai Huang between late 1983 and
early 1984. Chi-Lai Huang had earlier worked on a hardware description LALSD, a
language developed by Professor S.Y.H. Su, for his PhD work. [4] The rights holder for this
process, at the time proprietary, was "Automated Integrated Design Systems" (later renamed
to Gateway Design Automation in 1985). Gateway Design Automation was purchased
by Cadence Design Systems in 1990. Cadence now has full proprietary rights to Gateway's
Verilog and the Verilog-XL, the HDL-simulator that would become the de facto standard for
the next decade. Originally, Verilog was only intended to describe and allow simulation; the
automated synthesis of subsets of the language to physically realizable structures (gates etc.)
was developed after the language had achieved widespread usage. Verilog is a portmanteau
of the words "verification" and "logic".

Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make the language
available for open standardization. Cadence transferred Verilog into the public domain under
the Open Verilog International (OVI) (now known as Accellera) organization. Verilog was
later submitted to IEEE and became IEEE Standard 1364-1995, commonly referred to as
Verilog-95.

In the same time frame Cadence initiated the creation of Verilog-A to put standards support
behind its analog simulator Spectre. Verilog-A was never intended to be a standalone
language and is a subset of Verilog-AMS which encompassed Verilog-95.

Verilog 2001
Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that users
had found in the original Verilog standard. These extensions became IEEE Standard 1364-
2001 known as Verilog-2001.

Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for (2's
complement) signed nets and variables. Previously, code authors had to perform signed
operations using awkward bit-level manipulations (for example, the carry-out bit of a simple
8-bit addition required an explicit description of the Boolean algebra to determine its correct
value). The same function under Verilog-2001 can be more succinctly described by one of
the built-in operators: +, -, /, *, >>>. A generate–end generate construct (similar to VHDL's
generate–end generate) allows Verilog-2001 to control instance and statement instantiation
through normal decision operators (case–if–else). Using generate–end generate, Verilog-
2001 can instantiate an array of instances, with control over the connectivity of the
PAGE \* MERGEFORMAT 2
individual instances. File I/O has been improved by several new system tasks. And finally, a
few syntax additions were introduced to improve code readability (e.g. always, @*, named
parameter

override, C-style function/task/module header declaration).Verilog-2001 is the version of

Verilog supported by the majority of commercial EDA software packages.

Verilog 2005
Not to be confused with System Verilog, Verilog 2005 (IEEE- Standard 1364-2005) consists
of minor corrections, spec clarifications, and a few new language features (such as the uwire
keyword).A separate part of the Verilog standard, Verilog-AMS, attempts to integrate analog
and mixed signal modeling with traditional Verilog.

System Verilog
OpenVera, e-language encouraged the development of Superlog by Co-Design Automation
Inc (acquired by Synopsis). The foundations of Superlog and Vera were donated
to Accellera, which later became the IEEE standard P1800-2005: System Verilog.

System Verilog is a superset of Verilog-2005, with many new features and capabilities to aid
design verification and design modeling. As of 2009, the System Verilog and Verilog
language standards were merged into System Verilog 2009 (IEEE Standard 1800-2009). The
current version is IEEE standard 1800-2017.

4.3- Verilog Abstraction Levels

Verilog supports a design at many levels of abstraction, such as:

1. Behavioral Level
2. Register-Transfer Level
3. Gate Level
4. Switch Level

Behavioral level

The behavioral level describes a system by concurrent algorithms behavioral. Every

algorithm is sequential, which means it consists of a set of executed instructions one by one.
Functions, tasks, and blocks are the main elements. There is no regard for the structural
realization of the design. The module implementation is similar to C language programming
that includes algorithmic level implementation without worrying about hardware
implementation details.

PAGE \* MERGEFORMAT 2
Register-Transfer Level

Designs using the Register-Transfer or Data Flow Level specify a circuit's characteristics
using operations and the transfer of data between the registers. The modern definition of an
RTL code is Any code that is synthesizable is called RTL code. The module implementation
depends on data flow specification, how data flows and processes in the design circuit.

PAGE \* MERGEFORMAT 2
Gate Level

The characteristics of a system are described by logical links and their timing properties
within the logical level. All signals are discrete signals. They can only have definite logical
values (`0', `1', `X', `Z`). The usable operations are predefined logic primitives (basic gates).
Gate level modeling may not be the right idea for logic design. Gate level code is generated
using tools such as synthesis tools, and his netlist is used for gate-level simulation and
backend. The module implementation is similar to the gate-level design description in terms
of logic gates and interconnections between them.

Switch Level

The module implementation requires switch level knowledge to implement a design in terms
of storage nodes, switches. This is the lowest level of abstraction.

4.4-Lexical Tokens
Verilog language source text files are a stream of lexical tokens. A token consists of one or
more characters, and each single character is in exactly one token.

The basic lexical tokens used by the Verilog HDL are similar to those in C Programming
Language. Verilog is case sensitive. All the key words are in lower case. They are specified
as numbers, strings, identifiers, comments, delimiters, and keywords.

Numbers

You can specify a number in binary, octal, decimal or hexadecimal format. Negative
numbers are represented in 2’s compliment numbers. Verilog allows integers, real numbers
and signed & unsigned numbers. The syntax is given by − <size> <radix> <value>. Size or
unsized number can be defined in <Size> and <radix> defines whether it is binary, octal,
hexadecimal or decimal.

In case of unsized numbers, if base format is not specified, then it is treated as a decimal
format. Verilog provides two symbols ‘x’ to denote ‘unknown value’, and ‘z’ to denote ‘high
impedance value’. The underscore is used to separate bits for readability.

Negative numbers can be specified by using a minus sign before the size of a number and it
is stored as 2’s complement of the number. It is illegal to use minus sign between
<base_format> and <number>.

Example: -8’d5 (valid format and it is stored as 2’s complement of 5), 8’d-5 (Invalid format).

PAGE \* MERGEFORMAT 2
Strings

A string is a set of characters that are typically mentioned within double-quotes (” “). Each
character in a string requires 1 byte to store. They cannot be divided into multiple lines.
Example:
"Hello World" // Requires 11 bytes to store 11 characters.

Identifiers

Identifier is the name used to define the object, such as a function, module or register.
Identifiers should begin with an alphabetical characters or underscore characters. Ex. A_Z,
a_z.

Identifiers are a combination of alphabetic, numeric, underscore and $ characters. They can
be up to 1024 characters long.

The identifiers are the names given to the objects that can be referenced in the design.
1. They are case-sensitive and made up of alphanumeric characters (A to Z, a to z, 0 to 9),
the underscore ( _ ), and the dollar sign ($).
2. They cannot start with the dollar sign ($), and numbers.
3. Escaped identifiers begin with backslash \ character and end with whitespace (tab, space,
or newline). They are meant to process literally. Example: \x+y

Comments

There are two forms to represent the comments

1) Single line comments begin with the token // and end with carriage return.

Ex: //this is single line syntax

2) Multiline comments begins with the token /* and end with token */

Ex.: /* this is multiline Syntax*/

White Space

White spaces can contain characters for spaces, tabs, new-lines and form feeds. These
characters are ignored except when they serve to separate tokens.

White space characters are Blank space, Tabs, Carriage returns, New line, and Form feeds.

PAGE \* MERGEFORMAT 2
Example:
module tb;

reg [1:0] data; // observe spaces are given for indentation.

initial begin

$display("Hello\tWorld"); // \t is used between 'Hello' and 'World'.

end

endmodule

Operators

Operators are special characters used to put conditions or to operate the variables. There are
one, two and sometimes three characters used to perform operations on variables.

Ex- >, +, ~, &! =.

Verilog has three operator types: Unary, binary, and ternary

Operators Description Example

Unary Appear before the operand. Y = ~x;

Binary Appear between two operands. Y = x || y

Ternary Two separate operators appear to separate three operands Z = (a < b)?x:y;

Keywords

The keywords are special identifiers that are reserved to define the Verilog language
construct. They are in lowercase. Words that have special meaning in Verilog are called the
Verilog keywords. They should not be used as identifiers. Verilog keywords also include
compiler directives, and system tasks and functions.
Examples: module, end module, initial, always, begin, end, case, wire, while, nand, and, or,
assign, reg.

PAGE \* MERGEFORMAT 2
4.5-Data Types

A storage format having a specific range or type is called data type. They can be divided into
two groups.
1.Net type group: The net-type group represents physical connections between digital
circuits. Ex. wire, wand, wor, etc.
2.Variable type group: The variable type group represents the storage of values in digital
circuits. They are used as variables. Ex. reg, integer

Verilog supports 4 types of logic values as

Logic values Description

1 Logic one, true condition

0 Logic zero, false condition

x Unknown value

z High impedance

Nets
Nets represent physical connections between the digital circuits. They are declared using
keyword wire. The term net is not a keyword, it is a group of data types such as wire, wand,
wor, tri, etc.
Example: wire a; //one-bit value as a single net.

wire [5:0] a; //net as a vector

Note:
1. The net and wire terms are interchangeably used.
2. Usually, the default value of the net is z.

Registers
PAGE \* MERGEFORMAT 2
The registers represent data storage elements. It retains value till it is overridden. The
registers are like a placeholder, so they do not require a driver. Keyword used to declare a
register data type is reg.
Example: reg a; // single bit register

reg [5:0] a; // 6 bit register as a vector

Note:
1. Registers are similar to variables in the C language.
2. The default value of the reg is x.
Scalars and Vectors

Scalars: Single bit wire or register is known as a scalar.

Example: wire a;

reg a;

Vectors: The nets or registers can be declared as vectors to represent multiple bit widths. If
bit width is not specified, it is a scalar.

Example: wire [5:0] a;

reg [5:0] a;

Constants
The value of constants cannot be changed. It is read-only in nature.
Integer data type
The integers are general-purpose 32-bit register data types. They are declared by the ‘integer’
keyword.
integer count;

Real
The real data types can be constants or real register data types. They are declared using the
‘real’ keyword.
The real value is rounded off if they are assigned to integer data type.
Example: real data = 3.14;

String

PAGE \* MERGEFORMAT 2
n ordered collection of characters is called a string. They can be stored in reg data type. Each
character in a string requires 1 byte (8 bits) for storage and is typically mentioned within
double-quotes (” “).
Example: reg [8*11:0] name = "Hello World"; // String "Hello World"

//requires 11 bytes space.

Time
Verilog provides a time register to store simulation time. A time register is declared using
the ‘time’ keyword and it has a width of at least 64 bits depending on the simulator and
machine used.
Example: time curr_time; // curr_time is a time variable.

Arrays
Verilog allows arrays of reg, time, integer, and vector register data types.
Example:
integer count [0:5];

count[2] // To access 2nd element in an array

integer two_D_arr [3:0][3:0]; // illegal

time timestamp[1:5]; // array of 5 timestamp variables.

reg [3:0] var[0:7]; // Array of 8 var and each var has 4 as bit width

Note:
1. Verilog does not allow the use of multidimensional arrays.
2. Arrays for real variables are also not allowed in Verilog.
Memories
Verilog provides a facility to model register memories like ROM or RAM as an array of
registers. Each array element in an array is called a word.
reg mem [0:511]; // Memory mem with 512 1-bit word.

reg [3:0] mem [0:511]; // Memory mem with 512 4-bit words

Input, Output, Inout

These keywords are used to declare input, output and bidirectional ports of a task or module.
Here input and inout ports, which are of wire type and output port is configured to be of
wire, reg, wand, wor or tri type. Always, default is wire type.

PAGE \* MERGEFORMAT 2
Supply0, Supply1

Supply0 define wires tied to logic 0 (ground) and supply1 define wires tied to logic 1
(power).

Time

Time is a 64-bit quantity that can be used in conjunction with the $time system task to hold
simulation time. Time is not supported for synthesis and hence is used only for simulation
purposes.

Parameter

A parameter is defining a constant which can be set when you use a module, which allows
customization of module during the instantiation process.

4.6-Operators
Arithmetic Operators

These operators is perform arithmetic operations. The + and −are used as either unary (x) or
binary (z−y) operators.

The Operators which are included in arithmetic operation are −

+ (addition), −(subtraction), * (multiplication), / (division), % (modulus)

Example −

parameter v = 5;
reg[3:0] b, d, h, i, count;
h = b + d;
i = d - v;
cnt = (cnt +1)%16; //Can count 0 thru 15.

Relational Operators

These operators compare two operands and return the result in a single bit, 1 or 0.

Wire and reg variables are positive. Thus (−3’d001) = = 3’d111 and (−3b001)>3b110.

The Operators which are included in relational operation are −

 == (equal to)
 != (not equal to)
 > (greater than)
PAGE \* MERGEFORMAT 2
>= (greater than or equal to)
 < (less than)
 <= (less than or equal to)
Example

if (z = = y) c = 1;
else c = 0; // Compare in 2’s compliment; d>b
reg [3:0] d,b;

if (d[3]= = b[3]) d[2:0] > b[2:0];

else b[3];
Equivalent Statement
e = (z == y);

Bit-wise Operators

Bit-wise operators which are doing a bit-by-bit comparison between two operands.The
Operators which are included in Bit wise operation are −

& (bitwise AND)

 | (bitwiseOR)
 ~ (bitwise NOT)
 ^ (bitwise XOR)
 ~^ or ^~(bitwise XNOR)
Example

module and2 (d, b, c);

input [1:0] d, b;
output [1:0] c;
assign c = d & b;
end module

Logical Operators

Logical operators are bit-wise operators and are used only for single-bit operands. They
return a single bit value, 0 or 1. They can work on integers or group of bits, expressions and
treat all non-zero values as 1. Logical operators are generally, used in conditional statements
since they work with expressions.

The operators which are included in Logical operation are –

 ! (logical NOT)
 && (logical AND)
PAGE \* MERGEFORMAT 2
|| (logical OR)
Example

wire[7:0] a, b, c; // a, b and c are multibit variables.

reg x;

if ((a == b) && (c)) x = 1; //x = 1 if a equals b, and c is nonzero.

else x = !a; // x =0 if a is anything but zero.

Reduction Operators

Reduction operators are the unary form of the bitwise operators and operate on all the bits of
an operand vector. These also return a single-bit value.

The operators which are included in Reduction operation are −

& (reduction AND)

 | (reduction OR)
 ~& (reduction NAND)
 ~| (reduction NOR)
 ^ (reduction XOR)
 ~^ or ^~(reduction XNOR)
Example

Module chk_zero (x, z);

Input [2:0] x;
Output z;
Assign z = & x; // Reduction AND
End module

Shift Operators

Shift operators, which are shifting the first operand by the number of bits specified by
second operand in the syntax. Vacant positions are filled with zeros for both directions, left
and right shifts (There is no use sign extension).

The Operators which are included in Shift operation are −

<< (shift left)

 >> (shift right)
Example

Assign z = c << 3; /* z = c shifted left 3 bits;

Vacant positions are filled with 0’s */

PAGE \* MERGEFORMAT 2
Concatenation Operator

The concatenation operator combines two or more operands to form a larger vector.

The operator included in Concatenation operation is − { }(concatenation)

Example

wire [1:0] a, h; wire [2:0] x; wire [3;0] y, Z;

assign x = {1’b0, a}; // x[2] = 0, x[1] = a[1], x[0] = a[0]
assign b = {a, h}; /* b[3] = a[1], b[2] = a[0], b[1] = h[1],
b[0] = h[0] */
assign {cout, b} = x + Z; // Concatenation of a result

Replication Operator

The replication operator are making multiple copies of an item.

The operator used in Replication operation is − {n{item}} (n fold replication of an item)

Example

Wire [1:0] a, f; wire [4:0] x;

Assign x = {2{1’f0}, a}; // Equivalent to x = {0,0,a }
Assign y = {2{a}, 3{f}}; //Equivalent to y = {a,a,f,f}
For synthesis, Synopsis did not like a zero replication.

For example:-
Parameter l = 5, k = 5;
Assign x = {(l-k){a}}

Conditional Operator

Conditional operator synthesizes to a multiplexer. It is the same kind as is used in C/C++ and
evaluates one of the two expressions based on the condition.

The operator used in Conditional operation is −

(Condition) ? (Result if condition true) −

(result if condition false)

Example

Assign x = (g) ? a : b;

PAGE \* MERGEFORMAT 2
Assign x = (inc = = 2) ? x+1 : x-1;
/* if (inc), x = x+1, else x = x-1 */

Operands

Literals

Literals are constant-valued operands that are used in Verilog expressions. The two
commonly used Verilog literals are −

String − A string literal operand is a one-dimensional array of characters, which are

enclosed in double quotes (" ").
 Numeric − A constant number operand is specified in binary, octal, decimal or
hexadecimal Number.
Example

n − integer representing number of bits

F − one of four possible base formats −

b for binary, o for octal, d for decimal, h for hexadecimal.

“time is” // string literal

267 // 32-bit decimal number
2’b01 // 2-bit binary
20’hB36F // 20-bit hexadecimal number
‘062 // 32-bit octal number

Wires, Regs, and Parameters

Wires, regs and parameters are the data types used as operands in Verilog expressions.

Bit-Selection “x[2]” and Part-Selection “x[4:2]”

Bit-selects and part-selects are used to select one bit and a multiple bits, respectively, from a
wire, reg or parameter vector with the use of square brackets “[ ]”. Bit-selects and part-
selects are also used as operands in expressions in the same way that their main data objects
are used.

Example

reg [7:0] x, y;
reg [3:0] z;
reg a;

PAGE \* MERGEFORMAT 2
a = x[7] & y[7]; // bit-selects
z = x[7:4] + y[3:0]; // part-selects

Function Calls

In the Function calls, the return value of a function is used directly in an expression without
the need of first assigning it to a register or wire. It just place the function call as one of the
type of operands.it is needful to make sure you are knowing the bit width of the return value
of function call.

Example
Assign x = y & z & chk_yz(z, y); // chk_yz is a function

. . ./* Definition of the function */

Function chk_yz; // function definition
Input z,y;
chk_yz = y^z;
End function

4.7-Modules
A Module is a basic building design block in Verilog and it can be an element that
implements necessary functionality. It can also be a collection of lower-level design blocks.
As a part of defining a module, it has a module name, port interface, and parameters
(optional). The port interface i.e. inputs and outputs is used to connect the high-level module
with the lower one and hides internal implementation.

Module Declaration

In Verilog, A module is the principal design entity. This indicates the name and port list
(arguments). The next few lines which specifies the input/output type (input, output or inout)
and width of the each port.

The default port width is only 1 bit. The port variables must be declared by wire, wand., reg.
The default port variable is wire. Normally, inputs are wire because their data is latched
outside the module. Outputs are of reg type if their signals are stored inside.

Example

module sub_add(add, in1, in2, out);

input add; // defaults to wire
input [7:0] in1, in2; wire in1, in2;

output [7:0] out; reg out;

PAGE \* MERGEFORMAT 2
... statements ...
End module

Ports
An interface to communicate with other modules or a testbench environment is called a port.
In simple words, the input/ output pins of digital design are known as ports. This interface is
termed a port interface or port list. Since the port list is available for connection, internal
design implementation can be hidden from other modules or an environment.
Verilog keywords used for port declaration are as follows:

Port Type Keywords used Description

Input port input To receive signal values from another module

Output port output To send signal values to another module

Bidirectional port inout To send or receive signal values to another module.

Module instantiation

While designing complex digital circuits, usually it is split into various modules that connect
to have a top-level block. Thus, Verilog supports a hierarchical design methodology. When a
module is instantiated, a unique object is created and it has a unique name. Similarly, a top-
level design can also be instantiated while creating a testbench.
A mechanism for connecting the port to the external signals. When a module is instantiated
in the top-level hierarchy or top-level design in a testbench, any one of the following
methods can be used.

PAGE \* MERGEFORMAT 2
4.8-Writing a testbench in Verilog

i. Declare a testbench as a module.

Example: module <testbench_name>;

module mux_tb;

ii. Declare set signals that have to be driven to the DUT. The signals which are connected to
the input of the design can be termed as ‘driving signals’ whereas the signals which are
connected to the output of the design can be termed as ‘monitoring signals’. The driving
signal should be of reg type because it can hold a value and it is mainly assigned in a
procedural block (initial and always blocks). The monitoring signals should be of net (wire)
type that get value driven by the DUT.
Note: The testbench signal nomenclature can be different the DUT port
Example: reg i0, i1, sel; // declaration.

wire y;

iii. Instantiate top-level design and connect DUT port interface with testbench variables or
signals.
Example: <dut_module> <dut_inst> (<TB signals>)

mux_2_1 mux(.sel(sel), .i0(i0), .i1(i1), .y(y));

mux_2_1 mux(sel, i0, i1, y);

iv. Use an initial block to set variable values and it can be changed after some delay based on
the requirement. The initial block execution starts at the beginning of the simulation and
updated values will be propagated to an input port of the DUT. The initial block is also used
to initialize the variables in order to avoid x propagation to the DUT.
Initialize clock and reset variables.
Example:

initial begin

clock = 0;

reset = 0;

end

PAGE \* MERGEFORMAT 2
For 2:1 MUX example,
initial begin

// To print the values.

$monitor("sel = %h: i0 = %h, i1 = %h --> y = %h", sel, i0, i1, y);

i0 = 0; i1 = 1;

sel = 0;

#1;

sel = 1;

end

v. An always block can also be used to perform certain actions throughout the simulation.
Example: Toggling a clock
always #2 clock = ~ clock;

In the above example, the clock is not used in the DUT, so we will not be declaring or using
it.
vi. The system task $finish is used to terminate the simulation based on the requirement.
vii. The end module keyword is used to complete the testbench structure.

4.9-Gate Level Modeling

The module implementation is similar to the gate-level design description in terms of logic
gates and interconnections between them. It is a low-level abstraction that describes design
in terms of gates.
Note:
1.bufif1 and notif1 provide ‘Z output when a control signal is 0.
2. bufif0 and notif0 provide ‘Z output when a control signal is 1.

The three types of delays can be specified for delays from inputs to the primitive gate output
as the rise, fall, and turn-off delays.
Verilog supports some predefined basic gates (commonly knowns as primitives) as follows

and and g(out, i1, Performs AND operation on two or more inputs
PAGE \* MERGEFORMAT 2
i2, …)

or g(out, i1, i2,

or Performs OR operation on two or more inputs
…)

xor g(out, i1,

xor Performs XOR operation on two or more inputs
i2, …)

nand g(out, i1,

nand Performs NAND operation on two or more inputs
i2, …)

nor g(out, i1,

nor Performs NOR operation on two or more inputs
i2, …)

xnor g(out, i1,

xnor Performs XNOR operation on two or more inputs
i2, …)

The buffer (buf) passes input to the output as it is. It has

buf buf g(out, in)
only one scalar input and one or more scalar outputs.

The not passes input to the output as an inverted version.

not not g(out, in) It has only one scalar input and one or more scalar
outputs.

It is the same as buf with additional control over the buf

bufif1 g(out,
bufif1 gate and drives input signal only when a control signal is
in, control)
1.

It is the same as not having additional control over the not

notif1 g(out,
notif1 gate and drives input signal only when a control signal is
in, control)
1.

It is the same as buf with additional inverted control over

bufif0 g(out,
bufif0 the buf gate and drives input signal only when a control
in, control)
signal is 0.

It is the same as not with additional inverted control over

notif0 g(out,
notif0 the not gate and drives input signal only when a control
in, control)
signal is 0.

Example for Gate level Modeling:

module gate_modeling(
PAGE \* MERGEFORMAT 2
input i1, i2, ctrl,

output o_and, o_or,

output o_nand, o_nor,

output o_xor, o_xnor,

output o_buf, o_not,

output o_bufif1, o_notif1,

output o_bufif0, o_notif0

);

// Gate types

and a1 (o_and , i1, i2);

or o1 (o_or , i1, i2);

nand na1(o_nand, i1, i2);

nor no1(o_nor , i1, i2);

xor x1 (o_xor , i1, i2);

xnor xn1(o_xnor, i1, i2);

// buffer and not

buf(o_buf, i1);

not(o_not, i1);

// buffer and not with additional control

bufif1(o_bufif1, i1, ctrl);

notif1(o_notif1, i1, ctrl);

bufif0(o_bufif0, i1, ctrl);

notif0(o_notif0, i1, ctrl);

endmodule

PAGE \* MERGEFORMAT 2
4.10-Data flow Modeling

The data flow modeling provides a way to design circuits depending on how data flow
between the registers and how data is processed.
Continuous assignment
In data flow modeling, a continuous assignment is used to drive a value to a net or wire. A
continuous assignment statement is represented by an ‘assign’ statement.
syntax:
assign <drive_strength> <expression or net> = <delay> <constant value or expression>

drive_strength: driven strength on a wire. It is used to resolve conflict when two or more
assignments drive the same net or wire.
delay: to specify a delay in an assignment in terms of time units (similar to the delay in gate
modeling). It is useful to mimic real-time circuit behavior.
Note:
1. The drive_strength and delay both are optional to use. The R.H.S. expression is
evaluated and assigned to the L.H.S. expression.
2. Since ‘assign’ statements are always active, they are known as continuous
assignments.
3. In an implicit continuous assignment, Verilog provides flexibility to combine
net declaration and assign statement. Both regular and implicit continuous
assignments have no difference in terms of outcome.

Data flow Modeling Example:

module dataflow_modeling(

input i1, i2,

output [4:0] result

);

assign result[0] = i1^i2;

assign #5 result[1] = i1^i2;

assign result[3:2] = {i1, i2};

endmodule

PAGE \* MERGEFORMAT 2
4.11-Behavioral Modeling

On increasing digital design complexity, the designer has to find out the optimum algorithm
and architecture to implement in hardware. Hence, the designer needs a way that can
describe the design as per algorithm behavior. The Verilog provides the facility to represent
the behavior of design at a high-level abstraction similar to C language programming.
Structured Procedure blocks
Verilog provides two structured procedural blocks.
1. initial block
2. always block

initial block in Verilog

The initial block executes only once and starts its execution at the beginning of a simulation.
It is a non-synthesizable block and it can not contribute to building design schematic
development.
1. Multiple statements are enclosed within ‘begin’ and ‘end’ keywords and
statements are executed sequentially.
2. Multiple initial blocks are valid in a module and each block executes at zero
simulation time. An individual block may finish their execution independently.
3. Nested initial blocks are not valid.
4. It can execute in zero time or take some time depending on the delay involved.
Syntax:
initial

initial begin

...

end

Use of initial block:

1. Initialize variables
2. Implementing a testbench code

PAGE \* MERGEFORMAT 2
3. Monitoring waveforms and other processes that are executed only once during the entire
simulation.
4. Drive ports with specific values.

always block in Verilog

The always block starts its execution at the beginning of the simulation and continues to
execute in a loop.

1. Similar to the initial block, the always block is executed sequentially.

2. Based on changes in single or multiple signals in the sensitivity list, always block is
executed.
3. If no sensitivity list is specified, always block executes repeatedly throughout the
simulation.
4. Multiple always blocks are valid in a module and each block executes independently.
5. Nested always blocks are not valid.

Syntax:
always

always @(<sensitivity_list>)

always @(<sensitivity_list>) begin

...

end

Use of always block

1. To model repetitive activity in a digital design. Ex. clock generation.

2. To implement combinational and sequential elements in digital circuits.

PAGE \* MERGEFORMAT 2
4.12- Procedural Assignments
In the case of procedural assignment, the LHS variable remains unchanged until the same
value is updated by the next procedural statement.
Syntax:
<variable> = <expression or value>

1. RHS expression can be a value or an expression that evaluates to a value.

2. LHS expression can be reg, integer, real, time-variable, or memory element.
3. The procedural assignments can be placed inside procedural blocks like initial and
always blocks. They can also be placed inside tasks and functions.
4. The procedural assignments can also be placed directly inside a module while declaring a
variable. (Ex. reg [3:0] i_data = 4’h7)
There are two types of procedural assignments and both of them are widely used in the
designs written in the Verilog language.
1. Blocking assignments
2. Non-blocking assignments
Blocking Assignments
The blocking assignment statements are executed sequentially by evaluating the RHS
operand and finishes the assignment to LHS operand without any interruption from another
Verilog statement. Hence, it blocks other assignments until the current assignment completes
and is named as “blocking assignment”.
An equal ‘=’ is used as a symbol for the blocking assignment operator.

Syntax:
Variable> = <expression or value>

Example:
module blocking;

reg [3:0] data = 4'h4;

real r_value;

integer i_value;

time T;

initial begin

PAGE \* MERGEFORMAT 2
$monitor("At time T = %0t: data = %0d, r_value = %0f, i_value = %0h", T, data, r_value,
i_value);

r_value = 3.14;

i_value = 4;

#2 data = 4'h5;

#3 data = 'd7;

i_value = 10;

i_value = 6;

$finish;

end

always #1 T = $time;

endmodule

Non-Blocking Assignments

The non-blocking assignment statement starts its execution by evaluating the RHS operand
at the beginning of a time slot and schedules its update to the LHS operand at the end of a
time slot. Other Verilog statements can be executed between the evaluation of the RHS
operand and the update of the LHS operand. As it does not block other Verilog statement
assignments, it is called a non-blocking assignment.
A less than or equal to ‘<=’ is used as a symbol for the non-blocking assignment operator.
Note:
1. If <= symbol is used in an expression then it is interpreted as a relational operator.
2. If <= symbol is used in an assignment then it is interpreted as a non blocking
operator.
Example:
module nonblocking;

reg [3:0] data = 4'h4;

real r_value;

integer i_value;

time T;
PAGE \* MERGEFORMAT 2
initial begin

$monitor("At time T = %0t: data = %0d, r_value = %0f, i_value = %0h", T, data, r_value,
i_value);

r_value <= 3.14;

i_value <= 4;

#2 data <= 4'h5;

data <= #3 'd7;

i_value <= 10;

i_value <= 6;

#4 $finish;

end

always #1 T = $time;

endmodule

4.13-if statement in Verilog

Verilog supports ‘if’, ‘else if’, ‘else’ same as other programming languages.
The ‘If’ statement is a conditional statement based on which decision is made whether to
execute lines inside if block or not.
The begin and end are required in case of multiple lines present in ‘if’ block. For single-line
inside if statement may not require ‘begin..end’
The ‘if’ statement returns true if the expression calculates its value as 1 otherwise, for 0, x, z
values ‘if’ block will not be executed.
Syntax:
if(<condition>) begin

...

end

The else if or else statement

In case, ‘if’ statement does not hold true, ‘else if’ or ‘else’ will be executed.

For any condition hold true in ‘else if’ statement, subsequent ‘else if’ or ‘else’ statement will
not be checked.
PAGE \* MERGEFORMAT 2
Syntax: ‘else if’ and ‘else’ condition
if(<condition>) begin

...

end

else if(<condition>) begin

...

end

else if(<condition>) begin

...

end

else begin

...

end

Syntax: if and else condition

if(<condition>) begin

...

end

else begin

...

end

Example:
module if_example;

initial begin

int a, b;

a = 10;

b = 20;

PAGE \* MERGEFORMAT 2
if(a>b)

$display("a is greater than b");

else if(a<b)

$display("a is less than b");

else

$display("a is equal to b");

end

endmodule

4.14-Case statement in Verilog

The case statement has a given expression and it is checked with the expression (case item)
mentioned in the list in the written order and if it matches then the statement or group of
statements are executed. If it does not match with any of the written expressions of the list
then, the default statement will be executed.
If the ‘default’ statement is not given and the given expression is not matched with any
expression in the list, then the case statement evaluation will exit.
Verilog case statement uses case, endcase, and default keywords.
Syntax:
case(<expression>)

<case_item1>:

<case_item2>:

<case_item3>:

<case_item4>: begin

...

End

default:

endcase

Note:

PAGE \* MERGEFORMAT 2
1. The default statement is not mandatory. There must be only one default statement for the
single case statement.
2. The nested case statement is allowed.
3. Verilog case statements work similarly as switch statements in C language.
4. An expression inside a case statement can not use <= (relational operator).
5. The === operator is used instead of == operator in case statement comparison. I.e. case
statement checks for 0, 1, x and z values in the expression explicitly.

4.15-Loops in Verilog

A loop is an essential concept of any programming language. The loop is useful to read/
update an array content, execute a few statements multiple times based on a certain
condition. All looping statements can only be written inside procedural (initial and always)
blocks. In Verilog, we will discuss the following loop blocks.
1. For loop
2. While loop
3. Forever loop
4. Repeat loop
In all supported loops, begin and end keywords are used to enclose multiple statements as a
single block. A begin and end keywords are optional if the loop encloses a single statement.

For loop
The for loop iterates till the mentioned condition is satisfied. The execution of for loop
depends on –
1. Initialization
2. Condition
3. Update
Syntax:
for (<initialization>; <condition>; <update>) begin

...

end

While loop

A while loop is a control flow statement that executes statements repeatedly if the condition
holds true else loop terminates

Syntax:

PAGE \* MERGEFORMAT 2
while(<condition>) begin

...

end

Forever Loop

As the name suggests, a forever loop runs indefinitely. To terminate the loop, a ‘disable’
statement can be used. The forever loop is the same as the while loop with an expression that
holds always true.
The forever loop must be used with a timing control construct otherwise the simulator will
get stuck in a zero-delay simulation time loop.
Syntax:
forever begin

...

end

Repeat loop

A repeat loop is used to execute statements a given number of times.

Syntax:
repeat(<number>) begin // <number> can be variable or fixed value

...

end

4.16- Verilog blocks

The Verilog blocks are nothing but a group of statements that acts as one. The multiple
statements are grouped together using ‘begin’ and ‘end’ keywords. Verilog classifies blocks
into two types.
They are:
1. Sequential blocks
2. Parallel blocks

Sequential blocks Parallel blocks

PAGE \* MERGEFORMAT 2
The sequential block executes a group of The parallel block executes a group of
statements (blocking assignment statement) statements concurrently as their
in a sequential manner in which they are execution starts at the same simulation
specified. time.

Keywords used: begin and end Keywords used: fork and join

4.17- Switch Level Modeling

The switch level modeling is used to model digital circuits at the MOS level transistor. In
this era, digital circuits have become more complex and involve millions of transistors, so
modeling at the transistor level is rarely used by the designer. Hence, mostly higher
abstraction levels are used for design description.
NMOS and PMOS switches
The keywords ‘nmos’ and ‘pmos’ are used to model NMOS and PMOS respectively.

Example: module switch_modeling (input d_in, ctrl,output p_out, n_out);

pmos p1(p_out, d_in, ctrl);
nmos n1(n_out, d_in, ctrl);
endmodule

CMOS switch

A CMOS is modeled with a combination of NMOS and PMOS devices.

Example: module cmos_modeling ( input d_in, p_ctrl, n_ctrl,output out);
cmos p1(out, d_in, p_ctrl, n_ctrl);
endmodule

Power and ground

The power (Vcc) and ground (GND) sources need to be defined in transistor level modeling
to provide the supply to the signals.
Example: module power_modeling ( output vcc, gnd);
Supply1 high;
supply0 low;
assign vcc = high;
assign gnd = low;
endmodule

Bidirectional switches
PAGE \* MERGEFORMAT 2
As of now, we have discussed unidirectional switches like PMOS, NMOS, and CMOS that
conduct from drain to source. Sometimes in the design, there is a need to have a bi-
directional switch that can be driven from any of the device sides.
Keywords used: tran, tranif0, and tranif1

Resistive switches
The resistive switches provide a high impedance from source to drain with a reduction in
signal strength as compared to regular switches. All PMOS, NMOS, CMOS, and
bidirectional switches can be modeled as resistive devices.
Keyword used: ‘r’ as a prefix to the regular switches.

4.18- Tasks and Functions in Verilog

A function or task is a group of statements that performs some specific action. Both of them
can be called at various points to perform a certain operation. They are also used to break
large code into smaller pieces to make it easier to read and debug.
Functions in Verilog
A function that does not consume simulation time, returns a single value or an expression,
and may or may not take arguments.
Keywords used: function and endfunction.
Syntax:
// Style 1

function <return_type> <function_name> (input <port_list>,inout <port_list>, output

<port_list>);

...

return <value or expression>

endfunction

// Style 2

function <return_type> <function_name> ();

input <port_list>;

inout <port_list>;

PAGE \* MERGEFORMAT 2
output <port_list>;

...

return <value or expression>

endfunction

Tasks in Verilog
A task that may or may not consume simulation time, returns values
as output or inout argument type, and may or may not take arguments.
Keywords used: task and endtask.
Syntax:
// Style 1

task <task_name> (input <port_list>, inout <port_list>, output <port_list>);

...

endtask

// Style 2

task <task_name> ();

input <port_list>;

inout <port_list>;

output <port_list>;

...

endtask

PAGE \* MERGEFORMAT 2
5.INTRODUCTION TO XILINX

5.1-Xilinx Technology
Xilinx, Inc. was an American technology and semiconductor company that primarily
supplied programmable logic devices. The company is known for inventing the first
commercially viable field-programmable gate array (FPGA) and creating the first fabless
manufacturing model.

Xilinx was co-founded by Ross Freeman, Bernard Vonder schmitt and James V Barnett II in
the year 1984 and the company went public on the NASDAQ in the year
1990. AMD announced its acquisition of Xilinx in October 2020 and the deal was completed
on February 14, 2022, through an all-stock transaction worth an estimated $60 billion. Xilinx
remained a wholly owned subsidiary of AMD until the brand was phased out in June 2023,
with Xilinx's product lines now branded under AMD.

Xilinx was founded in Silicon Valley in 1984 and is headquartered in San Jose, United
States, with additional offices in Longmont, United States; Dublin,
Ireland; Singapore; Hyderabad, India; Beijing, China; Shanghai, China; Brisbane,
Australia, Tokyo, Japan and Yerevan, Armenia.

According to Bill Carter, former CTO and current fellow at Xilinx, the choice of the name
Xilinx refers to the chemical symbol for silicon Si. The "linx" represents programmable links
that connect programmable logic blocks together. The 'X's at each end represent the
programmable logic blocks.

Xilinx sells a broad range of FPGAs, complex programmable logic devices (CPLDs), design
tools, intellectual property and reference designs. Xilinx customers represent just over half of
the entire programmable logic market, at 51%.Altera (now subsidiary of Intel) is Xilinx's
strongest competitor with 34% of the market. Other key players in this market
are Actel (now subsidiary of Microsemi), and Lattice Semiconductor.

5.2-History of Xilinx
Early History

Ross Freeman, Bernard Vonder schmitt and James V Barnett II—all former employees
of Zilog, an integrated circuit and solid-state device manufacturer—co-founded Xilinx in
1984 with headquarters in San Jose, USA.

While working for Zilog, Freeman wanted to create chips that acted like a blank tape,
allowing users to program the technology themselves. "The concept required lots
of transistors and, at that time, transistors were considered extremely precious—people
thought that Ross's idea was pretty far out", said Xilinx Fellow Bill Carter, hired in 1984 to
design ICs as Xilinx's eighth employee.

PAGE \* MERGEFORMAT 2
It was at the time more profitable to manufacture generic circuits in massive volumes than
specialized circuits for specific markets. FPGAs promised to make specialized circuits
profitable.

Freeman could not convince Zilog to invest in FPGAs to chase a market then estimated at
$100 million, so he and Barnett left to team up with Vonderschmitt, a former colleague.
Together, they raised $4.5 million in venture funding to design the first commercially viable
FPGA. They incorporated the company in 1984 and began selling its first product by 1985.

By late 1987, the company had raised more than $18 million in venture capital (equivalent to
$46.37 million in 2022) and was making nearly $14 million a year.

Expansion
From 1988 to 1990, the company's revenue grew each year, from $30 million to $100
million. During this time, Monolithic Memories Inc. (MMI), the company which had been
providing funding to Xilinx, was purchased by AMD. As a result, Xilinx dissolved the deal
with MMI and went public on the NASDAQ in 1989. The company also moved to a
144,000-square-foot (13,400 m2) plant in San Jose, California, to handle increasingly large
orders from HP, Apple Inc., IBM and Sun Microsystems.

Other FPGA makers emerged in the mid-1990s. By 1995, the company reached $550 million
in revenue. Over the years, Xilinx expanded operations to India, Asia and Europe.

Xilinx's sales rose to $2.53 billion by the end of its fiscal year 2018. Moshe Gavrielov –
an EDA and ASIC industry veteran who was appointed president and CEO in early 2008 –
introduced targeted design platforms that combine FPGAs with software, IP cores, boards
and kits to address focused target applications. These targeted design platforms are an
alternative to costly application-specific integrated circuits (ASICs) and application-specific
standard products (ASSPs).

On January 4, 2018, Victor Peng, the company's COO, replaced Gavrielov as CEO.

Recent History
In 2011, the company introduced the Virtex-7 2000T, the first product based on 2.5D stacked
silicon (based on silicon interposer technology) to deliver larger FPGAs than could be built
using standard monolithic silicon. Xilinx then adapted the technology to combine formerly
separate components in a single chip, first combining an FPGA with transceivers based on
heterogeneous process technology to boost bandwidth capacity while using less power.

According to former Xilinx CEO Moshe Gavrielov, the addition of a heterogeneous

communications device, combined with the introduction of new software tools and the Zynq-
7000 line of 28 nm SoC devices that combine an ARM core with an FPGA, are part of
shifting its position from a programmable logic device supplier to one delivering “all things
programmable”.

PAGE \* MERGEFORMAT 2
In addition to Zynq-7000, Xilinx product lines include the Virtex, Kintex and Artix series,
each including configurations and models optimized for different applications. In April 2012,
the company introduced the Vivado Design Suite - a next-generation SoC-strength design
environment for advanced electronic system designs. In May, 2014, the company shipped the
first of the next generation FPGAs: the 20 nm UltraScale.

In September 2017, Amazon.com and Xilinx started a campaign for FPGA adoption. This
campaign enables AWS Marketplace's Amazon Machine Images (AMIs) with associated
Amazon FPGA Instances created by partners. The two companies released software
development tools to simplify the creation of FPGA technology. The tools create and
manage the machine images created and sold by partners.

In July 2018, Xilinx acquired DeepPhi Technology, a Chinese machine learning startup
founded in 2016. In October 2018, the Xilinx Virtex UltraScale+ FPGAs and NGCodec's
H.265 video encoder were used in a cloud-based video coding service using the High
Efficiency Video Coding (HEVC). The combination enables video streaming with the same
visual quality as that using GPUs, but at 35%-45% lower bitrate.

In November 2018, the company's Zynq UltraScale+ family of multiprocessor system-on-

chips was certified to Safety Integrity Level (SIL) 3 HFT1 of the IEC 61508 specification.
With this certification, developers are able to use the MPSoC platform in AI-based safety-
applications of up to SIL 3, in industrial 4.0 platforms of automotive, aerospace, and AI
systems. In January 2019, ZF Friedrichshafen AG (ZF) worked with Xilinx's Zynq to power
its ProAI automotive control unit, which is used to enable automated driving applications.
Xilinx's platform overlooks the aggregation, pre-processing, and distribution of real-time
data, and accelerates the AI processing of the unit.

In November 2018, Xilinx migrated its defense-grade XQ UltraScale+ products to TSMC's

16 nm FinFET Process. The Defence-grade heterogeneous multi-processor SoC devices and
encompassed the XQ Zynq UltraScale+ MPSoCs and RFSoCs as well as XQ UltraScale+
Kintex and Virtex FPGAs. That same month the company expanded its Alveo data center
accelerator cards portfolio with the Alveo U280. The initial Alveo line included the U200
and U250, which featured 16 nm UltraScale+ Virtex FPGAs and DDR4 SDRAM. Those two
cards were launched in October 2018 at the Xilinx Developer Forum. At the Forum, Victor
Peng, CEO of semiconductor design at Xilinx, and AMD CTO Mark Papermaster, used eight
Alveo U250 cards and two AMD Epyc 7551 server CPUs to set a new world record for
inference throughput at 30,000 images per second.

Also in November 2018, Xilinx announced that Dell EMC was the first server vendor to
qualify its Alveo U200 accelerator card, used to accelerate key HPC and other workloads
with select Dell EMC PowerEdge servers. The U280 included support for high-bandwidth
memory (HBM2) and high-performance server interconnect. In August 2019, Xilinx
launched the Alveo U50, a low-profile adaptable accelerator with PCIe Gen4 support. The
U55C accelerator card was launched in November 2021, designed for HPCC and big data
workloads by incorporating the RoCE v2-based clustering solution, allowing for FPGA-
based HPCC clustering to be integrated into existing data center infrastructures.

PAGE \* MERGEFORMAT 2
In January 2019 K&L Gates, a law firm representing Xilinx sent a DMCA cease and
desist letter to an EE YouTuber claiming trademark infringement for featuring the Xilinx
logo next to Altera's in an educational video. Xilinx refused to reply until a video outlining
the legal threat was published, after which they sent an apology e-mail.

In January 2019, Baidu announced that its new edge acceleration computing product,
EdgeBoard, was powered by Xilinx. Edgeboard is a part of the Baidu Brain AI Hardware
Platform Initiative, which encompasses Baidu's open computing services, and hardware and
software products for its edge AI applications. Edgeboard is based on the Xilinx Zynq
UltraScale+ MPSoC, which uses real-time processors together with programmable logic.
The Xilinx-based Edgeboard can be used to develop products like smart-video security
surveillance solutions, advanced-driver-assistance systems, and next-generation robots.

In February 2019, the company announced two new generations of its Zynq UltraScale+ RF
system on chip (RFSoC) portfolio. The device covers the entire sub-6 GHz spectrum, which
is necessary for 5G, and the updates included: an extended millimeter wave interface, up to
20% power reduction in the RF data converter subsystem compared to the base portfolio, and
support of 5G New Radio. The second generation release covered up to 5 GHz, while the
third went up to 6 GHz. As of February, the portfolio was the only adaptable radio platform
single chip that had been designed to address the industry's 5G network needs. The second
announcement revealed that Xilinx and Samsung Electronics performed the world's first 5G
New Radio (NR) commercial deployment in South Korea. The two companies developed
and deployed 5G Massive Multiple-input, Multiple-output (m-MIMO) and millimeter wave
(mmWave) products using Xilinx's UltraScale+ platform. The capabilities are essential for
5G commercialization. The companies also announced collaboration on Xilinx's Versal
adaptable compute acceleration platform (ACAP) products that will deliver 5G services. In
February 2019, Xilinx introduced an HDMI 2.1 IP subsystem core, which enabled the
company's devices to transmit, receive, and process up to 8K (7680 x 4320 pixels) UHD
video in media players, cameras, monitors, LED walls, projectors, and kernel-based virtual
machines.

In April 2019, Xilinx entered into a definitive agreement to acquire Solarflare

Communications, Inc. Xilinx became a strategic investor in Solarflare in 2017. The
companies have been collaborating since then on advanced networking technology, and in
March 2019 demonstrated their first joint solution: a single-chip FPGA-based 100G NIC.
The acquisition enables Xilinx to combine its FPGA, MPSoC and ACAP solutions with
Solarflare's NIC technology. In August 2019, Xilinx announced that the company would be
adding the world's largest FPGA - the Virtex Ultrascale+ VU19P, to the 16 nm Virtex
Ultrascale+ family. The VU19P contains 35 billion transistors.

In June 2019, Xilinx announced that it was shipping its first Versal chips. Using ACAP, the
chips’ hardware and software can be programmed to run almost any kind of AI software. On
October 1, 2019, Xilinx announced the launch of Vitis, a unified free and open
source software platform that helps developers take advantage of hardware adaptability.

In 2019, Xilinx exceeded $3 billion in annual revenues for the first time, announcing
revenues of $3.06 billion, up 24% from the prior fiscal year. Revenues were $828 million for
PAGE \* MERGEFORMAT 2
the fourth quarter of the fiscal year 2019, up 4% from the prior quarter and up 30% year over
year. Xilinx's Communications sector represented 41% of the revenue; the industrial,
aerospace and defence sectors represented 27%; the Data Center and Test, Measurement &
Emulation (TME) sectors accounted for 18%; and the automotive, broadcast and consumer
markets contributed 14%.

In August 2020, Subaru announced the use of one of Xilinx's chips as processing power for
camera images in its driver-assistance system. In September 2020, Xilinx announced its new
chipset, the T1 Telco Accelerator card, that can be used for units running on an open RAN
5G network.

On October 27, 2020, AMD reached an agreement to acquire Xilinx in a stock-swap deal,
valuing the company at $35 billion. The deal was expected to close by the end of 2021. Their
stockholders approved the acquisition on April 7, 2021. The deal was completed on February
14, 2022. Since the acquisition was completed, all Xilinx products are co-branded as AMD
Xilinx; started in June 2023, all Xilinx's products are now being consolidated under AMD's
branding.

In December 2020, Xilinx announced they were acquiring the assets of Falcon Computing
Systems to enhance the free and open source Vitis platform, a design software for adaptable
processing engines to enable highly optimized domain specific accelerators.

In April 2021, Xilinx announced a collaboration with Mavenir to boost cell phone tower
capacity for open 5G networks. That same month, the company unveiled the Kria portfolio, a
line of small form factor system-on-modules (SOMs) that come with a pre-built software
stack to simplify development. In June, Xilinx announced it was acquiring German software
developer Silexica, for an undisclosed amount.

5.3-Technology
Xilinx designs and develops programmable logic products, including integrated circuits
(ICs), software design tools, predefined system functions delivered as intellectual property
(IP) cores, design services, customer training, field engineering and technical support. Xilinx
sells both FPGAs and CPLDs for electronic equipment manufacturers in end markets such
as communications, industrial, consumer, automotive and data processing.

Xilinx's FPGAs have been used for the ALICE (A Large Ion Collider Experiment) at
the CERN European laboratory on the French-Swiss border to map and disentangle the
trajectories of thousands of subatomic particles. Xilinx has also engaged in a partnership
with the United States Air Force Research Laboratory's Space Vehicles Directorate to
develop FPGAs to withstand the damaging effects of radiation in space, which are 1,000
times less sensitive to space radiation than the commercial equivalent, for deployment in
new satellites. Xilinx FPGAs can run a regular embedded OS and can implement processor
peripherals in programmable logic. The Virtex-II Pro, Virtex-4, Virtex-5, and Virtex-6
FPGA families, which include up to two embedded IBM PowerPC cores, are targeted to the
needs of system-on-chip (SoC) designers.
PAGE \* MERGEFORMAT 2
Xilinx's IP cores include IP for simple functions (BCD encoders, counters, etc.), for domain
specific cores (digital signal processing, FFT and FIR cores) to complex systems (multi-
gigabit networking cores, the MicroBlaze soft microprocessor and the compact Picoblaze
microcontroller). Xilinx also creates custom cores for a fee.

The main design toolkit Xilinx provides engineers is the Vivado Design Suite, an integrated
design environment (IDE) with a system-to-IC level tools built on a shared scalable data
model and a common debug environment. Vivado includes electronic system level (ESL)
design tools for synthesizing and verifying C-based algorithmic IP; standards based
packaging of both algorithmic and RTL IP for reuse; standards based IP stitching and
systems integration of all types of system building blocks; and the verification of blocks and
systems.] A free version WebPACK Edition of Vivado provides designers with a limited
version of the design environment.

Xilinx's Embedded Developer's Kit (EDK) supports the embedded PowerPC 405 and 440
cores (in Virtex-II Pro and some Virtex-4 and -5 chips) and the Microblaze core. Xilinx's
System Generator for DSP implements DSP designs on Xilinx FPGAs. A freeware version
of its EDA software called ISE WebPACK is used with some of its non-high-performance
chips. Xilinx is the only FPGA vendor to distribute a native Linux freeware synthesis
toolchain.

Xilinx announced the architecture for a new ARM Cortex-A9-based platform for embedded
systems designers, that combines the software programmability of an embedded processor
with the hardware flexibility of an FPGA. The new architecture abstracts much of the
hardware burden away from the embedded software developers' point of view, giving them
an unprecedented level of control in the development process. With this platform, software
developers can leverage their existing system code based on ARM technology and utilize
vast off-the-shelf open-source and commercially available software component
libraries. Because the system boots an OS at reset, software development can get under way
quickly within familiar development and debug environments using tools such as ARM's
Real View development suite and related third-party tools, Eclipse-based IDEs, GNU, the
Xilinx Software Development Kit and others. In early 2011, Xilinx began shipping the Zynq-
7000 SoC platform immerses ARM multi-cores, programmable logic fabric, DSP data paths,
memories and I/O functions in a dense and configurable mesh of interconnect. The platform
targets embedded designers working on market applications that require multi-functionality
and real-time responsiveness, such as automotive driver assistance, intelligent video
surveillance, industrial automation, aerospace and defence, and next-generation wireless.

Following the introduction of its 28 nm 7-series FPGAs, Xilinx revealed that several of the
highest-density parts in those FPGA product lines will be constructed using multiple dies in
one package, employing technology developed for 3D construction and stacked-die
assemblies. The company's stacked silicon interconnect (SSI) technology stacks several
(three or four) active FPGA dies side by side on a silicon interposer – a single piece of
silicon that carries passive interconnect. The individual FPGA dies are conventional, and are
flip-chip mounted by microbumps on to the interposer.

PAGE \* MERGEFORMAT 2
The interposer provides direct interconnect between the FPGA dies, with no need for
transceiver technologies such as high-speed SerDes. In October 2011, Xilinx shipped the
first FPGA to use the new technology, the Virtex-7 2000T FPGA, which includes 6.8 billion
transistors and 20 million ASIC gates. The following spring, Xilinx used 3D technology to
ship the Virtex-7 HT, the industry's first heterogeneous FPGAs, which combine high
bandwidth FPGAs with a maximum of sixteen 28 Gbit/s and seventy-two 13.1 Gbit/s
transceivers to reduce power and size requirements for key Nx100G and 400G line card
applications and functions.

In January 2011, Xilinx acquired design tool firm AutoESL Design Technologies and added
System C high-level design for its 6- and 7-series FPGA families. The addition of AutoESL
tools extended the design community for FPGAs to designers more accustomed to designing
at a higher level of abstraction using C, C++ and System C.

In April 2012, Xilinx introduced a revised version of its toolset for programmable systems,
called Vivado Design Suite. This IP and system-centric design software supports newer high
capacity devices, and speeds the design of programmable logic and I/O. Vivado provides
faster integration and implementation for programmable systems into devices with 3D
stacked silicon interconnect technology, ARM processing systems, analog mixed signal
(AMS), and many semiconductor intellectual property (IP) cores.

In July 2019, Xilinx acquired NGCodec, developers of FPGA accelerated video encoders
for video streaming, cloud gaming and cloud mixed reality services. NGCodec video
encoders include support for H.264/AVC, H.265/HEVC, VP9 and AV1, with planned future
support for H.266/VVC and AV2.

In May 2020, Xilinx installed its first Adaptive Compute Cluster (XACC) at ETH Zurich in
Switzerland. The XACCs provide infrastructure and funding to support research in adaptive
compute acceleration for high performance computing (HPC). The clusters include high-end
servers, Xilinx Alveo accelerator cards, and high speed networking. Three other XACCs will
be installed at the University of California, Los Angeles (UCLA); the University of Illinois
at Urbana Champaign (UIUC); and the National University of Singapore (NUS).

5.4-Xilinx ISE Design Flow

There are different techniques for design entry. Schematic based, Hardware Description
Language and combination of both etc. . Selection of a method depends on the design and
designer. If the designer wants to deal more with Hardware, then Schematic entry is the
better choice. When the design is complex or the designer thinks the design in an algorithmic
way then HDL is the better choice. Language based entry is faster but lag in performance
and density.

HDLs represent a level of abstraction that can isolate the designers from the details of the
hardware implementation. Schematic based entry gives designers much more visibility into
the hardware. It is the better choice for those who are hardware oriented. Another method but
rarely used is state-machines. It is the better choice for the designers who think the design as

PAGE \* MERGEFORMAT 2
a series of states. But the tools for state machine entry are limited. In this documentation we
are going to deal with the HDL based design entry.

5.5-Family Lines of Products

Before 2010, Xilinx offered two main FPGA families: the high-performance Virtex series
and the high-volume Spartan series, with a cheaper EasyPath option for ramping to volume
production. The company also provides two CPLD lines: the CoolRunner and the 9500
series. Each model series has been released in multiple generations since its launch. With the
introduction of its 28 nm FPGAs in June 2010, Xilinx replaced the high-volume Spartan
family with the Kintex family and the low-cost Artix family.

Xilinx's newer FPGA products use a High-K Metal Gate (HKMG) process, which reduces
static power consumption while increasing logic capacity. In 28 nm devices, static power
accounts for much and sometimes most of the total power dissipation. Virtex-6 and Spartan-
6 FPGA families are said to consume 50 percent less power, and have up to twice the logic
capacity compared to the previous generation of Xilinx FPGAs.

PAGE \* MERGEFORMAT 2
In June 2010, Xilinx introduced the Xilinx 7 series: the Virtex-7, Kintex-7, and Artix-7
families, promising improvements in system power, performance, capacity, and price. These
new FPGA families are manufactured using TSMC's 28 nm HKMG process. The 28 nm
series 7 devices feature a 50 percent power reduction compared to the company's 40 nm
devices and offer capacity of up to 2 million logic cells. Less than one year after announcing
the 7 series 28 nm FPGAs, Xilinx shipped the world's first 28 nm FPGA device, the Kintex-
7. In March 2011, Xilinx introduced the Zynq-7000 family, which integrates a
complete ARM Cortex-A9 MPCore processor-based system on a 28 nm FPGA for system
architects and embedded software developers. In May 2017, Xilinx expanded the 7 Series
with the production of the Spartan-7 family.

In Dec, 2013, Xilinx introduced the UltraScale series: Virtex UltraScale and Kintex
UltraScale families. These new FPGA families are manufactured by TSMC in its 20 nm
planar process. At the same time it announced an UltraScale SoC architecture, called Zynq
UltraScale+ MPSoC, in TSMC 16 nm FinFET process. In March 2021, Xilinx announced a
new cost-optimized portfolio with Artix and Zynq UltraScale+ devices, fabricated on
TSMC's 16 nm process.

Virtex family
The Virtex series of FPGAs have integrated features that include FIFO and ECC logic, DSP
blocks, PCI-Express controllers, Ethernet MAC blocks, and high-speed transceivers. In
addition to FPGA logic, the Virtex series includes embedded fixed function hardware for
commonly used functions such as multipliers, memories, serial transceivers and
microprocessor cores. These capabilities are used in applications such as wired and wireless
infrastructure equipment, advanced medical equipment, test and measurement, and defense
systems.

The Virtex 7 family, is based on a 28 nm design and is reported to deliver a two-fold system
performance improvement at 50 percent lower power compared to previous generation
Virtex-6 devices. In addition, Virtex-7 doubles the memory bandwidth compared to previous
generation Virtex FPGAs with 1866 Mbit/s memory interfacing performance and over two
million logic cells.

In 2011, Xilinx began shipping sample quantities of the Virtex-7 2000T "3D FPGA", which
combines four smaller FPGAs into a single package by placing them on a special silicon
interconnection pad (called an interposer) to deliver 6.8 billion transistors in a single large
chip. The interposer provides 10,000 data pathways between the individual FPGAs – roughly
10 to 100 times more than would usually be available on a board – to create a single
FPGA. In 2012, using the same 3D technology, Xilinx introduced initial shipments of their
Virtex-7 H580T FPGA, a heterogeneous device, so called because it comprises two FPGA
dies and one 8-channel 28Gbit/s transceiver die in the same package.

The Virtex-6 family is built on a 40 nm process for compute-intensive electronic systems,

and the company claims it consumes 15 percent less power and has 15 percent improved
performance over competing 40 nm FPGAs.
PAGE \* MERGEFORMAT 2
The Virtex-5 LX and the LXT are intended for logic-intensive applications, and the Virtex-5
SXT is for DSP applications. With the Virtex-5, Xilinx changed the logic fabric from four-
input LUTs to six-input LUTs. With the increasing complexity of combinational logic
functions required by SoC designs, the percentage of combinational paths requiring multiple
four-input LUTs had become a performance and routing bottleneck. The six-input LUT
represented a tradeoff between better handling of increasingly complex combinational
functions, at the expense of a reduction in the absolute number of LUTs per device. The
Virtex-5 series is a 65 nm design fabricated in 1.0 V, triple-oxide process technology.

Legacy Virtex devices (Virtex, Virtex-II, Virtex-II Pro, Virtex 4) are still available, but are
not recommended for use in new designs.

Kintex

The Kintex-7 family is the first Xilinx mid-range FPGA family that the company claims
delivers Virtex-6 family performance at less than half the price while consuming 50 percent
less power. The Kintex family includes high-performance 12.5 Gbit/s or lower-cost
optimized 6.5 Gbit/s serial connectivity, memory, and logic performance required for
applications such as high volume 10G optical wired communication equipment, and provides
a balance of signal processing performance, power consumption and cost to support the
deployment of Long Term Evolution (LTE) wireless networks.
In August 2018, SK Telecom deployed Xilinx Kintex UltraScale FPGAs as their artificial
intelligence accelerators at their data centers in South Korea. The FPGAs run SKT's
automatic speech-recognition application to accelerate Nugu, SKT's voice-activated
assistant.

In July, 2020 Xilinx made the latest addition to their Kintex family, 'KU19P FPGA' which
delivers more logic fabric and embedded memory[

Artix

PAGE \* MERGEFORMAT 2
The Artix-7 family delivers 50 percent lower power and 35 percent lower cost compared to
the Spartan-6 family and is based on the unified Virtex-series architecture. The Artix family
is designed to address the small form factor and low-power performance requirements of
battery-powered portable ultrasound equipment, commercial digital camera lens control, and
military avionics and communications equipment. With the introduction of the Spartan-7
family in 2017, which lack high-bandwidth transceivers, the Artix-7's was clarified as being
the "transceiver optimized" member.

Zynq

A Zynq-7000 (XC7Z010-CLG400) on a Adapteva Parallella single-board computer

The Zynq-7000 family of SoCs addresses high-end embedded-system applications, such as
video surveillance, automotive-driver assistance, next-generation wireless, and factory
automation. Zynq-7000 integrate a complete ARM Cortex-A9 MPCore-processor-based
28 nm system. The Zynq architecture differs from previous marriages of programmable logic
and embedded processors by moving from an FPGA-centric platform to a processor-centric
model. For software developers, Zynq-7000 appear the same as a standard, fully featured
ARM processor-based system-on-chip (SOC), booting immediately at power-up and capable
of running a variety of operating systems independently of the programmable logic. In 2013,
Xilinx introduced the Zynq-7100, which integrates digital signal processing (DSP) to meet
emerging programmable systems integration requirements of wireless, broadcast, medical
and military applications.

The new Zynq-7000 product family posed a key challenge for system designers, because
Xilinx ISE design software had not been developed to handle the capacity and complexity of
designing with an FPGA with an ARM core. Xilinx's new Vivado Design Suite addressed
this issue, because the software was developed for higher capacity FPGAs, and it
included high level synthesis (HLS) functionality that allows engineers to compile the co-
processors from a C-based description.

The AXIOM, the world's first digital cinema camera that is open source hardware, contains a
Zynq-7000.

Spartan family

The Spartan series targets low cost, high-volume applications with a low-power footprint
e.g. displays, set-top boxes, wireless routers and other applications.

PAGE \* MERGEFORMAT 2
The Spartan-6 family is built on a 45 nm, 9-metal layer, dual-oxide process technology. The
Spartan-6 was marketed in 2009 as a low-cost option for automotive, wireless
communications, flat-panel display and video surveillance applications.

The Spartan-7 family, built on the same 28 nm process used in the other 7-Series FPGAs,
was announced in 2015, and became available in 2017. Unlike the Artix-7 family and the
"LXT" members of the Spartan-6 family, the Spartan-7 FPGAs lack high-bandwidth
transceivers.

EasyPath
Because EasyPath devices are identical to the FPGAs that customers are already using the
parts can be produced faster and more reliably from the time they are ordered compared to
similar competing programs.[175]

Versal
Versal is Xilinx's 7 nm architecture that targets heterogeneous computing needs in datacenter
acceleration applications, in artificial intelligence acceleration at the edge, Internet of
Things (IoT) applications and embedded computing.

The Everest program focuses on the Versal Adaptive Compute Acceleration Platform
(ACAP), a product category combining a traditional FPGA fabric with an ARM system on
chip and a set of coprocessors, connected through a network on a chip. Xilinx's goal was to
reduce the barriers to adoption of FPGAs for accelerated compute-intensive datacenter
workloads. They are designed for a wide range of applications in the fields of big
data and machine learning, including video transcoding, database querying, data
compression, search, AI inferencing, machine vision, computer vision, autonomous
vehicles, genomics, computational storage and network acceleration.

On April 15, 2020, it was announced that Xilinx would supply its Versal chips to Samsung
Electronics for 5G networking equipment. In July 2021, Xilinx debuted the Versal HBM,
which combines the network interface of the platform with HBM2e memory to alleviate data
bottlenecking.

PAGE \* MERGEFORMAT 2
6.VLSI Design - FPGA Technology

6.1-Introduction to FPGA
The full form of FPGA is “Field Programmable Gate Array”. It contains ten thousand to
more than a million logic gates with programmable interconnection. Programmable
interconnections are available for users or designers to perform given functions easily. A
typical model FPGA chip is shown in the given figure. There are I/O blocks, which are
designed and numbered according to function. For each module of logic level composition,
there are CLB’s (Configurable Logic Blocks).

CLB performs the logic operation given to the module. The inter connection between CLB
and I/O blocks are made with the help of horizontal routing channels, vertical routing
channels and PSM (Programmable Multiplexers).

The number of CLB it contains only decides the complexity of FPGA. The functionality of
CLB’s and PSM are designed by VHDL or any other hardware descriptive language. After
programming, CLB and PSM are placed on chip and connected with each other with routing
channels.

FPGA is an integrated circuit that may be programmed to execute a tailored function for a
particular purpose. FPGAs have become highly popular in the VLSI area. The code for
FPGA programming is written in languages like VHDL and Verilog.

PAGE \* MERGEFORMAT 2
6.2-Architectural Design of FPGA in VLSI:
FPGA is made up of thousands of basic pieces known as Configurable Logic Blocks (CLBs)
that are surrounded by a system of programmable interconnects known as a fabric that
distributes signals between CLBs and I/O blocks, which link the FPGA to external devices.
Multiplexers, Full Adders, D flip flops, and a lookup table (LUT) make up the Logic
Component, which is the basic building block of FPGA in VLSI. For each given source of
input, LUTs decide the output. LUTs with 4-6 input bits are common, and after testing, they
can even go up to 8 bits. The output of the LUT is stored in a D flip flop.

6.3-Categories of FPGA
FPGA in VLSI is categorized into the following categories based on their applications:

1. Low-End FPGAs: They use less power and are less complicated due to the lower number
of gates.
2. Mid-Range FPGAs: They use more power and have more gates than low-end FPGAs,
making them more sophisticated. They maintain a balance between cost and performance.
3. High-End FPGAs: With a higher gate density than mid-range FPGA in VLSI, they are
more complicated. Some High-End FPGAs outperform low-end and mid-range FPGAs in
terms of performance.

6.4-Advantages of FPGA in VLSI

1. FPGAs in VLSI give higher performance than a standard CPU because they are capable
of parallel computing.
2. FPGAs are reprogrammable and cost-effective.
3. FPGAs allow you to complete product development in a short amount of time, allowing
you to get your product to market faster.
4. No physical manufacturing steps are involved in it.

6.5-Disadvantages of FPGA in VLSI

1. PGAs use a lot of power and programmers have no control over power optimization;
2. FPGA programming is not as straightforward as C programming;
3. They are only used in low-volume production.

6.6-Applications of FPGA in VLSI

FPGAs are used in defence equipment for image processing in SDRs, ASIC prototypes,
high-performance computers, wireless communication systems such as WiMAX, WCDMA,
and others, and in medical equipment for diagnosis and therapy. They are also found in
consumer devices, such as flat-panel displays and household set-top boxes.

PAGE \* MERGEFORMAT 2
6.7-Need of an FPGA in VLSI
FPGAs are widely used in quick prototyping and verification of conceptual designs, as well
as in electrical systems when mask-production of a bespoke IC becomes prohibitively
expensive owing to low volume. In addition to their utility, as previously stated, their
internal structure makes them an excellent vehicle for learning all aspects of VLSI design, as
they contain combinational logic in the form of LUTs (look-up tables), sequential building
blocks in the form of flip-flops, and memory for programmability.

VLSI design necessitates meticulous planning throughout the design process, with a focus
on floor planning, layout, routing, transistor size, clock and power distribution, and timing
analysis. All of these concepts of VLSI design apply to the design of a basic FPGA that was
created as an individual project in a VLSI class.

FPGA in VLSI was programmed to create a traffic light controller. The design process was a
highly valuable tool for learning about VLSI design because it included all parts of a
complicated VLSI design.

6.8-Gate Array Design

The gate array (GA) ranks second after the FPGA, in terms of fast prototyping capability.
While user programming is important to the design implementation of the FPGA chip, metal
mask design and processing is used for GA. Gate array implementation requires a two-step
manufacturing process.

The first phase results in an array of uncommitted transistors on each GA chip. These
uncommitted chips can be stored for later customization, which is completed by defining the
metal interconnects between the transistors of the array. The patterning of metallic
interconnects is done at the end of the chip fabrication process, so that the turn-around time
can still be short, a few days to a few weeks. The figure given below shows the basic
processing steps for gate array implementation.

PAGE \* MERGEFORMAT 2
Typical gate array platforms use dedicated areas called channels, for inter-cell routing
between rows or columns of MOS transistors. They simplify the interconnections.
Interconnection patterns that perform basic logic gates are stored in a library, which can then
be used to customize rows of uncommitted transistors according to the netlist.

In most of the modern GAs, multiple metal layers are used for channel routing. With the use
of multiple interconnected layers, the routing can be achieved over the active cell areas; so
that the routing channels can be removed as in Sea-of-Gates (SOG) chips. Here, the entire
chip surface is covered with uncommitted NMOS and PMOS transistors. The neighbouring
transistors can be customized using a metal mask to form basic logic gates.

For inter cell routing, some of the uncommitted transistors must be sacrificed. This design
style results in more flexibility for interconnections and usually in a higher density. GA chip
utilization factor is measured by the used chip area divided by the total chip area. It is higher
than that of the FPGA and so is the chip speed.

6.9-Standard Cell Based Design

A standard Cell based design requires development of a full custom mask set. The standard
cell is also known as the polycell. In this approach, all of the commonly used logic cells are
developed, characterized and stored in a standard cell library.

A library may contain a few hundred cells including inverters, NAND gates, NOR gates,
complex AOI, OAI gates, D-latches and Flip-flops. Each gate type can be implemented in
several versions to provide adequate driving capability for different fan-outs. The inverter
gate can have standard size, double size, and quadruple size so that the chip designer can
select the proper size to obtain high circuit speed and layout density.

Each cell is characterized according to several different characterization categories, such as,

 Delay time versus load capacitance

 Circuit simulation model
 Timing simulation model
 Fault simulation model
 Cell data for place-and-route
 Mask data

For automated placement of the cells and routing, each cell layout is designed with a fixed
height, so that a number of cells can be bounded side-by-side to form rows. The power and
ground rails run parallel to the upper and lower boundaries of the cell. So that, neighbouring
cells share a common power bus and a common ground bus. The figure shown below is a
floorplan for standard-cell based design.

PAGE \* MERGEFORMAT 2
6.10- Custom Design
Full Custom Design:

In a full-custom design, the entire mask design is made new, without the use of any library.
The development cost of this design style is rising. Thus, the concept of design reuse is
becoming famous to reduce design cycle time and development cost.

The hardest full custom design can be the design of a memory cell, be it static or dynamic.
For logic chip design, a good negotiation can be obtained using a combination of different
design styles on the same chip, i.e. standard cells, data-path cells, and programmable logic
arrays (PLAs).

Practically, the designer does the full custom layout, i.e. the geometry, orientation, and
placement of every transistor. The design productivity is usually very low, typically a few
tens of transistors per day, per designer. In digital CMOS VLSI, full-custom design is hardly
used due to the high labour cost. These design styles include the design of high-volume
products such as memory chips, high-performance microprocessors and FPGA.

Semi-Custom Design:

This design methodology is used where there is less time for design and fewer quantities.
Here, most of the modules are predesigned and pretested. We can also add other components
to it. This will reduce the design time but it is not optimized and cost-efficient for mass
production. It has less production time.

PAGE \* MERGEFORMAT 2
7.SIMULATION RESULTS
7.1- 16 BIT ALU PROPOSED BLOCK DIAGRAMS

BLOCK DIAGRAM:

Algorithm:
1. Write the code for ALU based on the preferred bits of the ALU.
2. Execute the program in Synthesis/implementation mode by checking syntax in the
Synthesize-XST.
3. To view the RTL of the program run View RTL Schematic from Synthesize-XST.
4. If there are any errors in the code, then debug the errors, correct them and re run the
code.
5. After executing without errors provide the input values (A, B, ALU_Sel)in the
testbench.
6. Consider the input in the binary form, to display in the binary form right click on the
input line and select Binary from the list.
7. Save the inputs and then run the simulation model in behavioural synthesis.
8. To change the input click on the input line and give your binary value of the input.
9. Observe the output based on the inputs provided and the selected input given.
10.The output would be in the form of decimal.
11.To change the output number format right click on the output line and the select the
preferred number system (binary, decimal(signed, unsigned),hexadecimal,octal….).

PAGE \* MERGEFORMAT 2
ARCHITECTURE:

PAGE \* MERGEFORMAT 2
Sub block 1:

The below block is the sub block for inputs of the architecture at first level.

This is the internal architecture of the subblock at level 1.

Subblock2:

The below block is the subblock of the main architecture at the second level.

PAGE \* MERGEFORMAT 2
This is the architecture of subblock at level 2 of the main architecture .
Subblock 3:

The above is the block for the addition and subtraction operations in the proposed
architecture.
Subblock 4:

This the block for multiplication operation of the proposed architecture.

PAGE \* MERGEFORMAT 2
Subblock 5:

Block diagram for Less than operation in the design.

Subblock 6:

Block diagram for Equal operation in the design of ALU.

Subblock 7:

Block diagram for Greater than operation in the ALU design.

Subblock 8:

PAGE \* MERGEFORMAT 2
Architecture for the subblock of the third level in the architecture.
Subblock 9:

This is the block diagram of the first multiplexer used at the final level of the architecture.