PROJECT REPORT ON
ALU
(ARITHMETIC &
LOCICAL
UNIT)
BY
NARESH SINGH DOBAL
9540493245
[email protected]
1.
INTRODUCTION
TO
VHDL
Introduction
VHDL is an acronym for VHSIC Hardware Description
language (VHSIC stands for Very High Speed Integrated
Circuits). It is a hardware description language that can be
used
to
model
digital
system
at
many
levels
of
abstraction ranging from the algorithmic level to the gate
level. The complexity of the digital system being modeled
could vary from that of a simple gate to a complete digital
electronic system, or anything in between. The VHDL
language can be regarded as an integrated amalgamation
of the following languages:
Sequential languages +
Concurrent languages +
Net list languages +
Timing specification +
Waveform generation language
=> VHDL
Therefore the language has constructs that enable to
express the concurrent or sequential behavior of a digital
system with or without timing. It also allows modeling the
system
as
waveforms
an
can
interconnection
also
be
of
generated
components.
using
the
Test
same
constructs. All the above constructs can be combined to
provide a comprehensive description of the system in a
single model.
1.1
Advantages of VHDL over other
hardware description languages
1. The language can be used as a communication medium
between different CAD and CAE tools.
2. The language supports hierarchy; that is, a digital
system can be modeled as a set of interconnected
components each component in turn can be modeled as a
set of interconnected subcomponents.
3. The language supports flexible design methodologies
top down, bottom-up or mixed.
4. It supports both synchronous and asynchronous timing
models.
5. Various digital modeling techniques such as finite state
machine
descriptions,
algorithmic
descriptions
and
Boolean equations can be modeled using this language.
6. The language is publicly available, human readable,
machine readable and not proprietary.
7. The language supports three basic different description
styles: structural, dataflow and behavioral.
8. Arbitrarily large designs can be modeled using the
language and therefore there are no limitations imposed
by the language on the size of a design.
9. The model can not only describe the functionality of a
design, but also contain information about the design
itself in terms of user-defined attributes, such as total
area and speed.
10. The capability of defining new data types provides the
power to describe and simulate a new design technology
at a very high level of abstraction without any concern
about the implementation details.
1.2
VHDL : The language
VHDL is a hardware description language that can be
used to model a digital system. The digital system can be
as simple as a logic gate or as complex as a complete
electronic system.
To describe an entity VHDL provides five different
types of primary constructs, called design units. They are
1. Entity declaration.
2. Architecture declaration.
3. Configuration declaration.
4. Package.
5. Package body.
Entity declaration
An entity is modeled using an entity declaration and
at least one architecture body. The entity declaration
describes the external view of an entity. The entity
declaration
specifies
the
name
of
the
entity
being
modeled and lists the set of interface ports. Ports are
signals (wires) through which the entity communicates
with the other models in its external environment. An
example for a half-adder circuit is given below.
Figure: AND_GATE
This entity called AND_GATE has four input ports In1,
In2, In3, In4 and one output ports Out1 .std_logic is a
predefined type of language construct.
1.2.2 Architecture body
The second important part of a VHDL source file is
the architecture declaration. Every entity declaration you
write must be accompanied by at least one corresponding
architecture. An architecture declaration is a statement
that describes the underlying function and/or structure of
a circuit. Each architecture in your design must be
associated by name with one entity in the design. The
architecture body contains the internal description of the
entity. The internal structure can be specified by any of
the following modeling styles.
a) As a set of interconnected components.
b) As a set of concurrent assignment statements.
c) As a set of sequential assignment statements.
d) As a combination of the above three.
VHDL architectures can be classified as
Behavioral
Defines a sequentially described functioning of the design.
Structural
Defines a interconnections between previously defined
components.
Dataflow
A combination of structural and behavioral architectures .
The
different
modeling
styles
explained below
A.
Structural style of modeling
are
This is modeled as a set of interconnected
components. Such a model for an AND4 is shown.
The name of the architecture body is AND4. The
architecture
body
is
composed
of
two
parts:
the
declarative part (before the keywords begin) and the
statement part (after keywords begin). Two component
declarations are present in the declarative part of the
architecture body.
The declared components are instantiated in the
statement part of the architecture body using component
instantiation
statements.
U1,
U2,
and
U1
are
the
component labels for these component instantiations. The
first component instantiation statement labeled U1, shows
that signals A and B are connected to input port In1 and
In2 of component AND2 ands TEMP1 is connected to
output port AND2 entity. Similarly in the second and third
component instantiation statement, signals A and B are
connected to respective ports of AND entities.
B. Data flow style of modeling
In this modeling style, the flow of data through the
entity is expressed primarily using concurrent signal
assignment statements. The structure of the entity is not
explicitly specified in this modeling style, but it can be
implicitly deduced. The data flow model of the AND4 entity
is given below.
The dataflow is described using two concurrent
signal
assignment
assignment
statements
statements).
In
(or
a
sequential
signal
signal
assignment
statement, the symbol <= implies an assignment of a
value to a signal. The value of the expression on the right
hand side of the statement is computed and is assigned to
the signal on the left-hand side, called the target signal. A
concurrent signal assignment statement is executed only
when any signal used in the expression on the right hand
side has an event on it that is the value for the signal
changes. Delay information is also included in the signal
assignment statements using after clauses.
C.
Behavioral style of modeling
The behavioral style of modeling specifies the behavior of
an entity as a set of statements that are executed
sequentially in the specified process statement. They do
not explicitly specify the structure of the entity but merely
its functionality. A process statement is a concurrent
statement that can appear within an architecture body.
For example, consider the following behavioral model for
the same AND4.
A process statement also has a declarative part
(before keyword begin) and a statement part (between
keyword
begin
appearing
and
within
end
the
process).
statement
The
part
statements
are
executed
sequentially. The list of signals specified within the
parentheses after the keyword process constitutes a
sensitivity list and the process statement is invoked
whenever there is and event on any signal in the list. In
the example when an event occurs on In1, In2, In3 or In4
the statements appearing within the process, statements
are executed sequentially. However, all the processes that
appear in a design are executed concurrently. Signal
assignment statements appearing within a process are
called
sequential
Sequential
variable
signal
signal
assignment
assignment
assignment
statements.
statements,
statements,
are
including
executed
sequentially independent of whether an event occurs on
any signals in its right-hand side expression.
D.
Mixed style of modeling
It is possible to mix the three modeling styles that
were described before in a single architecture body. That
is, within an architecture body, we could use component
instantiation
therefore
statements
their
order
and
of
concurrent
appearance
statements;
within
the
architecture body is not important. Note that a process
statement itself is a concurrent statement; however
statements
within
executed sequentially.
process
statement
art
always
1.2.3 Configuration declaration
A configuration declaration is used to select one of
the possibly many architecture bodies that an entity may
have, and to bind component instances to entities. For
structural models, configurations can be thought of as the
parts list for the model. For component instances, the
configuration specifies from many architectures for an
entity, which architecture to use for a specific instance.
When
the
configuration
for
an
entity-architecture
combination is compiled into the library, a simulatable
object
is
created.
An
example
of
the
configuration
declaration for the HALF-ADDER entity is given below.
Library CMOS-LIB, MY-LIB;
Configuration CONFIG of HALF-ADDER is
For HA-STRUCTURE
For X1: XOR2
Use entity CMOS-LIB.XOR-GATE (DATAFLOW);
End for;
For A1: AND2
Use configuration MY-LIB.AND-CONFIG;
End for;
End for;
End CONFIG;
1.2.4 Package--
The primary purpose of a package is to encapsulate
elements that can be shared (globally) among two or more
design units. A package is a common storage area used to
hold data to be shared among a number of entities.
Declaring data inside of a package allows the data to be
referenced by other entities; thus, the data can be shared.
A package consists of two parts: a package declaration
section and a package body. The package declaration
defines the interface for the package, much the same way
that the entity defines the interface for a model. The
package body specifies the actual behavior of the package
in the same method that the architecture statement does
for a model.
A package is collection of commonly used subprograms,
data types and constants.
Package saves coding and promotes code reuse.
STANDARD and TEXTIO are provided in the STD library
that defines useful data types and utilities.
A 'USE' statement is used to access a library.
Package consists of two parts: Package header:
This defines the contents of a package that is made visible
after the statement 'use library package_name all"
Package body:
This provides the implementation details of sub programs,
Items declared in the body are not visible to the user of
the package.
1.2.5
Test bench
A test bench is used to verify the functionality of a design.
The
test
bench
allows
the
designer
to
verify
the
functionality of the design at each step in the HDL
synthesis-based methodology. When the designer makes a
small change to fix an error, the change can be tested to
make sure that it did not affect other parts of the design.
New versions of the design can be verified against known
good results to verify compatibility. A test bench is at the
highest level in the hierarchy of the design. The test
bench
instantiates
the
design
under
test
(DUT).
It
provides the necessary input stimulus to the DUT and
examines the output from the DUT.
HIGH LEVEL
DESIGN FLOW
For
-- XILINX -Software
tools
The high level design flow is illustrated in figure.
Each step is explained below.
2.1 HDL Capture--
After the specification has been completed, the designer
can begin the process of implementation. The designer
creates the VHDL description that describes the clock-by-
clock behavior of the design. The VHDL codes for entities
of the design are entered. The designer then checks the
design for any syntax errors. After all syntax errors are
removed, the VHDL code is verified for correctness by
simulating it.
2.2 VHDL Synthesis--
The goal of the VHDL Synthesis step is to create a design
that implements the required functionality and matches
the designers constraints in speed, area, or power.
The VHDL synthesis tools convert the VHDL description
into a net list in the target FPGA or ASIC technology. For
the VHDL synthesis tool to perform this step properly, the
VHDL code must be written in a particular style.
The designer reads the VHDL design into the VHDL
synthesis
tool.
The
tool
reports
syntax
errors
and
synthesis errors. Synthesis errors usually result from the
designer using constructs that are not synthesizable. In
such cases, the code has to be modified and simulated
again. The synthesizer produces an output net list in the
target technology and a number of report files. The
designer looks at the report files to determine the quality
of the synthesis output. The most common output files are
the timing report and the area report.
Most synthesis tools produce a number of other reports
such as hierarchy reports, instance reports, net reports,
power reports, and others.
The most useful reports initially are the timing and area
reports, because these are usually the most critical
factors.
The area report shows the designer how much of the
resources of the chip the design has consumed. The
designer can tell if the design is too big for a particular
chip and the designer needs to target a larger chip, if the
design should go into a smaller chip, or if the current chip
will work fine. The designer can also get a relative size of
the design to use in later stages of the design process.
The timing report shows the timing of critical paths or
specified paths of the design. The designer examines the
timing of the critical paths closely because these paths
ultimately determine how fast the design can run. If the
longest path is a timing critical part of the design and is
not meeting the speed requirements of the designer, then
the designer may have to modify the VHDL code or try
new timing constraints to make the path meet timing.
The most important type of output data is the netlist for
the design in the target technology. This output is a gate
or macro level output in a format compatible with the
place and route tools that are used to implement the
design in the target chip. For instance, most place and
route tools for FPGA technologies take in an EDIF netlist
as an input format. The primitives used in the netlist are
those used in the synthesis library to describe the
technology. The place and route tools understand what to
do with these primitives in terms of how to place a
primitive and how to route wires to them.
2.3
RTL Simulation--
In
Simulation,
RTL
the
designer
uses
stimulus
that
represents the design environment to drive the design
and check to make sure that the results are correct. A
standard VHDL simulator can be used to read the RTL
VHDL description and verify the correctness of the design.
The VHDL simulator reads the VHDL description, compiles
it into an internal format, and then executes the compiled
format using test vectors. The designer can look at the
output of the simulation and determine whether or not the
design is working properly. The designer has a number of
ways to analyze the output. The most common are
waveform output and tabular output.
2.4
Functional Gate Level Verification--
Some designers might want to do a quick check on the
output of the synthesis tool to make sure that the
synthesis tool produced a design that is functionally
correct. To do this the designer runs a functional gate
level verification. The designer reads the output VHDL
netlist from the synthesis tool plus a library of the
synthesis primitives into the VHDL simulator and runs the
simulation using the RTL Verification vectors. If the design
matches, then the synthesis tool did not produce logic
mismatches; if it does not match, the designer needs to
debug the VHDL RTL description to see what is wrong.
2.5 Implementation
Implementation (Place and route) tools are used to take
the design netlist and implement the design in the target
technology device. The place and route tools place each
primitive from the netlist into an appropriate location on
target
device
and
then
route
signals
between
the
primitives to connect the devices according to the netlist.
One input to the place and route tools is the netlist in EDIF
or another netlist format. Another input to some place and
route tools is the timing constraints, which give the place
and route tools an indication about which signals have
critical timing associated with them and to route these
nets in the most timing efficient manner. These nets are
typically
identified
during
the
static
timing
analysis
process during synthesis. These constraints tell the place
and route tool to place the primitives in close proximity to
one another and to use the fastest routing. The closer the
cells are, the shorter the routed signals will be and the
shorter the time delay.
Some place and route tools allow the designer to
specify the placement of large parts of the design. This
process is also known as floor planning. Floor planning
allows the user to pick locations on the chip for large
blocks of the design so that routing wires are as short as
possible. The designer lays out blocks on the chip as
general areas. The floor planner feeds this information to
the place and route tools so that these blocks are placed
properly. After the cells are placed, the router makes the
appropriate connections.
After all the cells are place and routed, the output of
the place and route tools consists of data files that can be
used to implement the chip. In the case of FPGAs, these
files describe all of the connections needed to fuse FPGAs
macrocells to implement the functionality required. Antifuse FPGAs use this information to burn the appropriate
fuses
while
reprogrammable
devices
download
this
information to the device to turn on the appropriate
transistor connections. The other output from the place
and route software is a file used to generate the timing
file.
This
file
describes
the
actual
timing
of
the
programmed FPGA device or the final ASIC device. This
timing file, as much as possible, describes the timing
extracted from the device when it is plugged into the
system for testing. The most common format of this file
for most simulators is the SDF (Standard Delay Format).
2.6 Post Layout Timing verification
After the place and route process has completed, the
designer will want to verify the results of the place and
route
process.
There
are
number
of
methods
to
accomplish this task but the most common is to use post
route gate level simulation. This simulation combines the
netlist used for place and route with the timing file from
the place and route process into a simulation that checks
both functionality and timing of the design. The designer
can run the simulation and generate accurate output
waveforms that show whether or not the device is
operating properly and if the timing is being met. For
VHDL
simulations
this
requires
VITAL
compliant
(standard way of describing designs with designs that
allow SDF timing back annotation) VHDL Simulator.
3.
DESIGNING
STEPS
1.
STARTING OF SOFTWARE XILINX PROJECT NAVIGATOR
2. CREATING A NEW PROJECT
3.
NEW PROJECT WIZARD
4. NEW PROJECT WIZARD (DEVICE PROPERTIES WINDOW)--
5.
NEW PROJECT WIZARD (PROJECT SUMMERY)
6.
CREATE NEW SOURCE WIZARD
7.
CREATE NEW SOURCE WIZARD (PORT DEFINE)
8.
CREATE NEW SOURCE WIZARD (MODULE SUMMER)
9.
ARCHITECTURE CODING
INTRODUCTION
To
ARITHMETIC
LOGIC
UNIT
(ALU)
What is ALU (Arithmetic Logic Unit)
Abbreviation of arithmetic logic unit, the part of a computer that performs all
arithmetic computations, such as addition and multiplication, and all comparison
operations. The ALU is one component of the CPU (central processing unit
The Arithmetic Logic Unit (ALU) is essentially the heart of a CPU. This is what
allows the computer to add, subtract, and to perform basic logical operations such
as AND/OR. Since every computer needs to be able to do these simple functions,
they are always included in a CPU. How a company designs their ALU has a
significant impact on the overall performance of their CPU. In this article I will
give a brief introduction to some basics of ALU design; you will quickly see how
complicated these things can get.
An arithmetic-logic unit (ALU) is the part of a computer processor (CPU) that
carries out arithmetic and logic operations on the operands in computer instruction
words. In some processors, the ALU is divided into two units, an arithmetic unit
(AU) and a logic unit (LU). Some processors contain more than one AU - for
example, one for fixed-point operations and another for floating-point operations.
(In personal computers floating point operations are sometimes done by a floating
point unit on a separate chip called a numeric coprocessor.)
Typically, the ALU has direct input and output access to the processor controller,
main memory (random access memory or RAM in a personal computer), and
input/output devices. Inputs and outputs flow along an electronic path that is called
a bus. The input consists of an instruction word (sometimes called a machine
instruction word) that contains an operation code (sometimes called an "op code"),
one or more operands, and sometimes a format code. The operation code tells the
ALU what operation to perform and the operands are used in the operation. (For
example, two operands might be added together or compared logically.) The
format may be combined with the op code and tells, for example, whether this is a
fixed-point or a floating-point instruction. The output consists of a result that is
placed in a storage register and settings that indicate whether the operation was
performed successfully. (If it isn't, some sort of status will be stored in a permanent
place that is sometimes called the machine status word.)
In general, the ALU includes storage places for input operands, operands that are
being added, the accumulated result (stored in an accumulator), and shifted results.
The flow of bits and the operations performed on them in the subunits of the ALU
is controlled by gated circuits. The gates in these circuits are controlled by a
sequence logic unit that uses a particular algorithm or sequence for each operation
code. In the arithmetic unit, multiplication and division are done by a series of
adding or subtracting and shifting operations. There are several ways to represent
negative numbers. In the logic unit, one of 16 possible logic operations can be
performed - such as comparing two operands and identifying where bits don't
match.
The design of the ALU is obviously a critical part of the processor and new
approaches to speeding up instruction handling are continually being developed.
In computing, an arithmetic logic unit (ALU) is a digital circuit that performs
arithmetic and logical operations. The ALU is a fundamental building block of the
central processing unit (CPU) of a computer, and even the simplest
microprocessors contain one for purposes such as maintaining timers. The
processors found inside modern CPUs and graphics processing units (GPUs)
accommodate very powerful and very complex ALUs; a single component may
contain a number of ALUs.
BASIC BUILDING BLOCKS OF AN ALU
CONTROL UNIT The
the computer
control
system
unit
and
maintains
directs
the
order
flow
within
of
traffic
(operations) and data. The flow of control is indicated by the
dotted arrows on figure 1-1. The control unit selects
program
statement
at
time
from
the program storage
area, interprets the statement, and sends
electronic
impulses
to
the
appropriate
the arithmetic-logic unit and storage
section to cause them to carry out the instruction. The
unit
on
does
the
not
data.
one
perform
the
Specifically,
control
actual processing operations
the control unit manages the
operations of the CPU, be it a single-chip microprocessor or a
fill-size mainframe. Like a traffic director, it decides when to start
and
stop
(control
and
timing),
what
to
do
(program
instructions), where to keep information (memory), and with
what devices to communicate (I/O). It controls the flow of all data
entering and leaving the computer. It accomplishes this
communicating
or
interfacing
with
by
the arithmetic-logic
unit, memory, and I/O areas. It provides the computer with
the ability to function under program control. Depending on the
design
of
capability
the computer,
to
function
the
CPU
under
can
manual
also
have
control
the
through
man/machine interfacing. The control unit consists of several
basic logically defined
areas. These
logically
defined
areas
work closely with each other. Timing in a computer regulates the
flow of signals that control the operation of the computer.
The instruction and control portion makes up
the
decision-
making and memory-type functions. Addressing is the process
of locating the operand (specific
operation.
An interrupt
information)
for
is a break in the normal flow of
operation of a computer (e.g., CTRL + ALT + DEL).
memory is
addressable
random-access
storage
given
memory
Control
(RAM) consisting of
registers. Cache memory
is
small,
high-speed RAM buffer located between the CPU and main
memory; it can increase the speed of the PC. Read-only memory
(ROM) are chips with a set of software instructions supplied
by the manufacturer built into them that enables the computer
to perform its I/O operations. The control unit is also capable of
shutting down the computer when the power supply detects
abnormal conditions.
ARITHMETIC-LOGIC
UNIT The
performs all arithmetic
multiplication,
operations
and
test
arithmetic-logic
operations
division)
various
and
(addition,
logic
conditions
unit
(ALU)
subtraction,
operations. Logic
encountered
during
processing and allow for different actions to be taken based on
the results. The data required to perform the arithmetic and
logical functions are inputs from the designated CPU registers and
operands. The
operations.
circuits
and
ALU
relies
These
on
include
number
(adders/subtracters),
registers. Figure
diagram
of
an
1-2
ALU
basic
shows
a
to
systems,
timing,
of
items
perform
data
its
routing
instructions, operands,
a representative
microcomputer.
block
PRIMARY
STORAGE (MAIN MEMORY) The primary storage section (also
called internal storage, main storage, main memory, or just
memory) serves four purposes: . To hold data transferred from
an I/O device to the input storage area, where it remains until the
computer is ready to process it. This is indicated by the solid
arrow on figure 1-1. . To hold both the data being processed
and
the
intermediate
operations.
storage
This
section.
is
It
a
is
results
working
of
the
storage
arithmetic-logic
area within
the
sometimes referred to as a scratch
pad memory. . To hold the processing results in an
storage area for transfer to an I/O device.
output
CPU BUILDING BLOCKS
Registers
(IR, PC, ACC)
Control Unit
(CU)
Arithmetic Logic Unit
(ALU)
ARITHMETIC LOGIC UNIT STRUCTURES
ARITHMETIC LOGIC UNIT SCEMETIC SYMBOL
A and B: the inputs to the ALU
(aka operands)
R: Output or Result
F: Code or Instruction from the Control Unit (aka as op-code)
D: Output status; it indicates cases such as:
carry-in
carry-out,
overflow,
division-by-zero
And . . .
Numerical systems
An ALU must process numbers using the same format as the rest of the digital
circuit. The format of modern processors is almost always the two's complement
binary number representation. Early computers used a wide variety of number
systems, including one's complement, sign-magnitude format, and even true
decimal systems, with ten tubes per digit.
ALUs for each one of these numeric systems had different designs, and that
influenced the current preference for two's complement, as this is the
representation that makes it easier for the ALUs to calculate additions and
subtractions.
The two's-complement number system allows for subtraction to be accomplished
by adding the negative of a number in a very simple way which negates the need
for specialized circuits to do subtraction.
practical overview
Most of a processor's operations are performed by one or more ALUs. An ALU
loads data from input registers, an external Control Unit then tells the ALU what
operation to perform on that data, and then the ALU stores its result into an output
register. Other mechanisms move data between these registers and memory.
Simple operation
A simple example arithmetic logic unit (2-bit ALU) that does AND, OR, XOR, and
addition
Most ALUs can perform the following operations:
Integer arithmetic operations (addition, subtraction, and sometimes
multiplication and division, though this is more expensive)
Bitwise logic operations (AND, NOT, OR, XOR)
Bit-shifting operations (shifting or rotating a word by a specified number of
bits to the left or right, with or without sign extension). Shifts can be
interpreted as multiplications by 2 and divisions by 2.
Complex operations
Engineers can design an Arithmetic Logic Unit to calculate any operation. The
more complex the operation, the more expensive the ALU is, the more space it
uses in the processor, the more power it dissipates. Therefore, engineers
compromise. They make the ALU powerful enough to make the processor fast, but
yet not so complex as to become prohibitive. For example, computing the square
root of a number might use :
1. Calculation in a single clock Design an extraordinarily complex ALU that
calculates the square root of any number in a single step.
2. Calculation pipeline Design a very complex ALU that calculates the square
root of any number in several steps. The intermediate results go through a
series of circuits arranged like a factory production line. The ALU can
accept new numbers to calculate even before having finished the previous
ones. The ALU can now produce numbers as fast as a single-clock ALU,
although the results start to flow out of the ALU only after an initial delay.
3. interactive calculation Design a complex ALU that calculates the square
root through several steps. This usually relies on control from a complex
control unit with built-in microcode.
4. Co-processor Design a simple ALU in the processor, and sell a separate
specialized and costly processor that the customer can install just beside this
one, and implements one of the options above.
5. Software libraries Tell the programmers that there is no co-processor and
there is no emulation, so they will have to write their own algorithms to
calculate square roots by software.
6. Software emulation Emulate the existence of the co-processor, that is,
whenever a program attempts to perform the square root calculation, make
the processor check if there is a co-processor present and use it if there is
one; if there isn't one, interrupt the processing of the program and invoke the
operating system to perform the square root calculation through some
software algorithm.
The options above go from the fastest and most expensive one to the slowest and
least expensive one. Therefore, while even the simplest computer can calculate the
most complicated formula, the simplest computers will usually take a long time
doing that because of the several steps for calculating the formula.
Powerful processors like the Intel Core and AMD64 implement option #1 for
several simple operations, #2 for the most common complex operations and #3 for
the extremely complex operations.
Inputs and outputs
The inputs to the ALU are the data to be operated on (called operands) and a code
from the control unit indicating which operation to perform. Its output is the result
of the computation.
In many designs the ALU also takes or generates as inputs or outputs a set of
condition codes from or to a status register. These codes are used to indicate cases
such as carry-in or carry-out, overflow, divide-by-zero, etc.
ALUs vs. FPUs
A Floating Point Unit also performs arithmetic operations between two values, but
they do so for numbers in floating point representation, which is much more
complicated than the two's complement representation used in a typical ALU. In
order to do these calculations, a FPU has several complex circuits built-in,
including some internal ALUs.
In modern practice, engineers typically refer to the ALU as the circuit that
performs integer arithmetic operations (like two's complement and BCD). Circuits
that calculate more complex formats like floating point, complex numbers, etc.
usually receive a more specific name such as FPU.
Logic Gates
Before we get to the overall design of an ALU, we first have to understand the
basics of logic gates. Figure 1 shows the basic logic gates shown in their graphical
representations. Keep in mind that each of these can be made from transistors by
combining them in different ways. What types of transistors and how they are
arranged can impact the performance of the gate.
'AND' Gate
'OR' Gate
'XOR' Gate
'NOT' Gate
'NAND' Gate
'NOR' Gate
'XNOR' Gate
Figure 1: Basic logic gates (courtesy of wikipedia.com)
These logic gates work by taking two inputs (one input for the 'NOT' gate) and
producing an output. If we consider the 'AND' gate the output will be true, or '1' (or
a high voltage), if input #1 and input #2 are true, and the output will be false, or '0'
(or a low voltage), if one or both inputs are false. Likewise, if we consider the 'OR'
gate the output will be true if input #1 or input #2 are true. The 'XOR' gate output
will be true if either input is true, but false if both inputs are true; this is an
implementation of the exclusive 'OR' logic operation. The 'NOT' gate will output
the opposite of the input; so if the input is true the 'NOT' gate's output will be false.
The 'NAND', 'NOR', and 'XNOR' gates are implementations of the 'AND', 'OR',
and 'XOR' gates respectively with a 'NOT' gate prior to the output; so a 'NAND'
gate will return what a 'AND' gate does not.
These logic functions are by themselves an important part of a CPU's functionality,
but performing logic operations on two inputs is only so useful. By combining
these gates together we can have devices with more inputs. For example, in Figure
2 I have combined three 'AND' gates. These three 'AND' gates will produce an
output that is true only when all four inputs are true. In essence, this is a 4 bit
'AND' gate. You can extrapolate from this and form an 8 bit 'AND' gate by
combining two 4 bit 'AND's and one 2 bit 'AND'.
Figure 2: A 4 bit 'AND' device
Arithmetic
By combining these gates into even more clever configurations we can perform
other useful functions, like addition. Figure 3 shows a typical configuration
referred to as a half-adder. To understand how this adder works we have to think of
the inputs not as true or false but as '1' or '0'. The output of this adder is the sum of
the inputs with a carry bit. If the inputs are '1' and '1' we are adding 1 plus 1. The
output labeled 'SUM' is just an 'XOR' of the inputs which will be '0'. The output
labeled 'CARRY' is an AND gate which of course will be '1'. The addition answer
therefore is 10 which is the binary addition of '1' and '1'. If the inputs are '1' and '0'
the 'SUM' will be '1' and the 'CARRY' will be '0', giving an answer of 01 or just 1.
Figure 3: A half-adder
So, if this is performing binary addition why is it called a half-adder? This is
because in order to add binary numbers greater than two bits we need the adder to
be able to take in a carry bit along with the two input bits. This full-adder is shown
in Figure 4. You can see that the full-adder is two half-adders with one additional
'OR' gate. To use a full-adder to add two binary numbers of arbitrary size you will
begin with the right most bit, called the least significant bit (LSB) of each number
with a carry in bit of '0'. You would then add the two bits, record the sum, and use
the carry out bit as the carry in bit when adding the next two bits and moving
towards the most significant bits (MSB). By repeating this process you can add
two binary numbers of any arbitrary length. This process is known as a ripple
carry.
Figure 4: A full-adder
Figure 5 shows a half-sub. In this scenario if we have input 'A' equal to '1' and
input 'B' equal to '0' we want to subtract 0 from 1. You can see that the 'DIFF'
output will be '1' and the 'BORROW' Output will be '0'. Like the half-adder, the
half-sub can be used to implement a full-sub, shown in Figure 6.
Figure 5: A half-sub
Figure 6: A full-sub
Arithmetic units are usually grouped together into an ALU which has inputs,
outputs, and control bits which tell the ALU which type of operation to perform.
Figure 7 shows a typical diagram of an ALU. In this diagram A and B are the data
inputs, F is the control input to choose the function, R is the result of the function
applied to A and B, and D is the status of the output so that you know when the
function is done.
The example of the ripple carry addition is an effective method of adding binary
numbers. Let us extrapolate this a little bit and imagine a 32-bit adder. If we want
to add two 32-bit numbers we can start at the LSB and move left as we calculate
the carry bits. If adding two bits takes a couple of clock cycles, then the total time
taken to add all 32 bits is significant. Thankfully this problem has already been
solved. The solution: a carry look ahead adder.
In a carry look ahead adder the binary numbers are split into sections, perhaps of 4
bits each. Now each section can begin calculating its carry bits beginning with the
section's LSB and moving towards the section's MSB. Once a carry bit reaches the
MSB of its section the bit can then jump ahead 4-bits at a time instead of
continuing towards the number's MSB one bit at a time. The logic involved with
keeping all of the carries straight is quite complex and becomes more complex as
the size of the sections increase; because of this more time is spent calculating the
carries of each section and less time on the faster propagation of the carry bits.
However, if the section size is too small then there are so many sections for the
carries to propagate through that there is not much time saved versus the ripple
carry adder. Deciding upon a section size involves a detailed analysis of the gate
and propagation delays which can vary depending on the technology used within
the logic gates.
The carry look ahead technique is one optimization the engineers can make to the
ALU. There are many others. For instance, to do multiplication one would
normally just add the number to itself over and over, but there are optimization
techniques that can be employed to speed this process up. These are some of the
differences between the ALUs of various processors and a major reason why some
processors are better at certain types of operations than others. For instance, a GPU
will have an ALU which is optimized for the arithmetic often performed for the
display of graphics while a CPU will have an ALU designed to be optimized for
the most common operations performed by users.
PROGRAM
FOR ALU
IN VHDL--
--------------------------------------------------------------------------------- Title
: alu
-- Design
: arithmetic logic unit
-- Author
: NARESH DOBAL
-- Company
: NSD
---------------------------------------------------------------------------------- File
: alu.vhd
-- Generated : Fri Nov 19 12:41:19 2010
-- From
-- By
: interface description file
: Itf2Vhdl ver. 1.20
---------------------------------------------------------------------------------- Description :
--------------------------------------------------------------------------------
--{{ Section below this comment is automatically maintained
-- and may be overwritten
--{entity {alu} architecture {alu_arc}}
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity alu is
port(
din1 : in STD_LOGIC_VECTOR(3 downto 0);
din2 : in STD_LOGIC_VECTOR(3 downto 0);
sel : in STD_LOGIC_VECTOR(3 downto 0);
dout : out STD_LOGIC_VECTOR(3 downto 0)
);
end alu;
--}} End of automatically maintained section
architecture alu_arc of alu is
begin
with sel select
dout <= din1 and din2 when "0000",
din1 or din2 when "0001",
din1 nand din2 when "0010",
din1 nor din2 when "0011",
din1 xor din2 when "0100",
din1 xnor din2 when "0101",
not din1 when "0110",
not din2 when "0111",
din1 + "0001" when "1000",
din2 + "0001" when "1001",
din1 + din2 when "1010",
din1 - din2 when "1011",
din2 - "0010" when "1100",
din1 - "0010" when "1101",
din1 + "0010" when "1110",
din2 + "0010" when others;
end alu_arc;
OUTPUT
WAVE-FORM
OF ALU
CIRCUIT DIAGRAM
FOR ABOVE CODE
GENERATED BY
SYNTHESIS TOOL
(RTL LAYOUT DESIGN)