COMPUTER ORGANIZATION AND DESIGN
5th
Edition
The Hardware/Software Interface
Chapter 3
Arithmetic for Computers
§3.1 Introduction
Arithmetic for Computers
Operations on integers
Addition and subtraction
Multiplication and division
Floating-point real numbers
Representation and operations
Chapter 3 — Arithmetic for Computers — 2
§3.3 Multiplication
Multiplication
Start with long-multiplication approach
multiplicand
multiplier
1000
1000
×× 4012
1011
1000
2000
1000
1000
0000
0000
1000
4000
1011000
4012000
product
Binary makes it easy:
◆0 place 0 ( 0 x multiplicand)
◆ 1 place a copy ( 1 x multiplicand)
Chapter 3 — Arithmetic for Computers — 3
Multiplication Hardware
Initially 0
Chapter 3 — Arithmetic for Computers — 4
Example
Using 4-bit numbers to save space,
multiply 2ten×3ten; or 0010two×0011two
Chapter 3 — Arithmetic for Computers — 5
Optimized Multiplier
Perform steps in parallel: add/shift
One cycle per partial-product addition
That’s ok, if frequency of multiplications is low
Chapter 3 — Arithmetic for Computers — 6
Example
Multiply 0010two×0011two using optimized
multiplier hardware
Chapter 3 — Arithmetic for Computers — 8
Signed Multiplication
The simplest approach:
Negate all negative operands at the beginning,
perform unsigned multiplication on the resulting numbers,
and then negate the product if necessary.
Disadv:
Extra clock cycles may be needed to negate
multiplicand, multiplier, and the double length
product.
Booth’s Algorithm
E.g.: 210 610 = 00102 01102
6 = 01102 6 = -2 + 8 = - 00102 + 10002
Consider 011102 = 1 x 23 + 1 x 22 + 1 x 21 (three additions)
Faster calculation
4 1
011102 = 1 x 2 - 1 x 2 (one addition and one subtraction)
14 = 16 - 2 0111102 ?
9 8 7 6 5 4 3 2 1 0
00111111000 ?
00111111111 29 - 1
- 111 23 - 1
00111111000 (29-1) - (23-1) = 29 - 23
m n
000111111111000000
2m+1 - 2n
Booth’s Algorithm
The key to Booth’s insight:
classify groups of bits into the beginning, the middle,
or the end of a run of 1s
Booth’s Algorithm
Booth’s algorithm
1. Depending on the current and previous bits, do one of
the following:
00: Middle of a string of 0s no arithmetic op
01: End of a string of 1s add the multiplicand to the left half of
the product
10: Beginning of a string of 1s sub the multiplicand from the left
half of the product
11: Middle of a string of 1s no arithmetic op
2. Shift the Product register right 1 bit
Booth’s Algorithm
Requirements:
Start with a 0 for the bit to the right of the rightmost bit
Booth’s ops is identified according to the values in 2 bits.
Extend the sigh when the product is shifted to the right.
Sign extension
E.g., 210 610 = 00102 01102
Example
Let’s try Booth’s algorithm with negative numbers:
2tenx -3ten = -6ten or 0010twox1101two=1111 1010two
Sign extension
2-Bit Booth Encoding
Using more bits for faster multiplies
b: multiplicand
NOP
+b
+b
+2b
-2b
-b
-b
NOP
MIPS Multiplication
Two 32-bit registers for product
HI: most-significant 32 bits
LO: least-significant 32-bits
Instructions
mult rs, rt / multu rs, rt
64-bit product in HI/LO
mfhi rd / mflo rd
Move from HI/LO to rd
Can test HI value to see if product overflows 32 bits
mul rd, rs, rt
Least-significant 32 bits of product –> rd
Chapter 3 — Arithmetic for Computers — 17
§3.4 Division
Division
Check for 0 divisor
Long division approach
quotient If divisor ≤ dividend bits
dividend 1 bit in quotient, subtract
1001 Otherwise
1000 1001010 0 bit in quotient, bring down next
dividend bit
divisor -1000
10 Restoring division
101 Do the subtract, and if remainder
1010 goes < 0, add divisor back
-1000 Signed division
remainder 10 Divide using absolute values
Adjust sign of quotient and remainder
n-bit operands yield n-bit as required
quotient and remainder
Chapter 3 — Arithmetic for Computers — 18
Division Hardware
Initially divisor
in left half
Initially dividend
Why ?
Chapter 3 — Arithmetic for Computers — 19
Example
4-bit : dividing 7ten by 2ten or 0000 0111two by 0010two
Optimized Divider
Start: place Dividend in Remainder
1. Shift Remainder register left 1 bit
2. Subtract Divisor from left of Remainder,
put result in left half of Remainder
Remainder >= 0 Test Remainder < 0
Remainder
3a. Shift Remainder 3b. Restore original
to left, setting new value by adding Divisor
rightmost bit to 1 to left half of Remainder,
put sum there, shift
Remainder to left, set
One cycle per partial-remainder new rightmost bit to 0
subtraction
32nd No: < 32
Looks a lot like a multiplier! repetition?
Same hardware can be used for both Yes, repetitions
Done. Shift left half of Remainder right 1 bit
Chapter 3 — Arithmetic for Computers — 21
Example
Using optimized divider hardware to divide
7ten by 2ten or 0000 0111two by 0010two
Signed Division
Simplest solution:
remember the signs of the divisor and dividend
and then negate the quotient if the signs disagree
Note: the dividend and the remainder must have
the same signs!
Example
7 2 Quotient = 3, Remainder = 1
7 2 Quotient = 3, Remainder = 1
7 2 Quotient = 3, Remainder = 1
7 2 Quotient = 3, Remainder = 1
MIPS Division
Use HI/LO registers for result
HI: 32-bit remainder
LO: 32-bit quotient
Instructions
div rs, rt / divu rs, rt
No overflow or divide-by-0 checking
Software must perform checks if required
Use mfhi, mflo to access result
Chapter 3 — Arithmetic for Computers — 25
COMPUTER ORGANIZATION AND DESIGN
5th
Edition
The Hardware/Software Interface
3.5
Floating Point
§3.5 Floating Point
Floating Point
Representation for non-integral numbers
Including very small and very large numbers
Like scientific notation
–2.34 × 1056 normalized
+0.002 × 10–4 not normalized
+987.02 × 109
In binary
±1.xxxxxxx2 × 2yyyy
Types float and double in C
Chapter 3 — Arithmetic for Computers — 27
Floating Point Standard
Defined by IEEE Std 754-1985
Developed in response to divergence of
representations
Portability issues for scientific code
Now almost universally adopted
Two representations
Single precision (32-bit)
Double precision (64-bit)
Chapter 3 — Arithmetic for Computers — 28
IEEE Floating-Point Format
Single: 8 bits single: 23 bits
double: 11 bits double: 52 bits
S Exponent Fraction
x ( 1)S (1 Fraction) 2(Exponent Bias)
S: sign bit (0 non-negative, 1 negative)
Normalize significand: 1.0 ≤ |significand| < 2.0
Always has a leading pre-binary-point 1 bit, so no need to
represent it explicitly (hidden bit)
Significand is Fraction with the “1.” restored
Exponent: excess representation: actual exponent + Bias
Ensures exponent is unsigned
Single: Bias = 127; Double: Bias = 1023
Chapter 3 — Arithmetic for Computers — 29
Single-Precision Range
Exponents 00000000 and 11111111 reserved
Smallest value
Exponent: 00000001
actual exponent = 1 – 127 = –126
Fraction: 000…00 significand = 1.0
±1.0 × 2–126 ≈ ±1.2 × 10–38
Largest value
exponent: 11111110
actual exponent = 254 – 127 = +127
Fraction: 111…11 significand ≈ 2.0
±2.0 × 2+127 ≈ ±3.4 × 10+38
Chapter 3 — Arithmetic for Computers — 30
Double-Precision Range
Exponents 0000…00 and 1111…11 reserved
Smallest value
Exponent: 00000000001
actual exponent = 1 – 1023 = –1022
Fraction: 000…00 significand = 1.0
±1.0 × 2–1022 ≈ ±2.2 × 10–308
Largest value
Exponent: 11111111110
actual exponent = 2046 – 1023 = +1023
Fraction: 111…11 significand ≈ 2.0
±2.0 × 2+1023 ≈ ±1.8 × 10+308
Chapter 3 — Arithmetic for Computers — 31
Floating-Point Precision
Relative precision
all fraction bits are significant
Single: approx 2–23
Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal
digits of precision
Double: approx 2–52
Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal
digits of precision
Chapter 3 — Arithmetic for Computers — 32
Floating-Point Example
Represent –0.75
–0.75 = (–1)1 × 1.12 × 2–1
S=1
Fraction = 1000…002
Exponent = –1 + Bias
Single: –1 + 127 = 126 = 011111102
Double: –1 + 1023 = 1022 = 011111111102
Single: 1011111101000…00
Double: 1011111111101000…00
Chapter 3 — Arithmetic for Computers — 33
Floating-Point Example
What number is represented by the single-
precision float
11000000101000…00
S=1
Fraction = 01000…002
Exponent = 100000012 = 129
x = (–1)1 × (1 + 012) × 2(129 – 127)
= (–1) × 1.25 × 22
= –5.0
Chapter 3 — Arithmetic for Computers — 34
Infinities and NaNs
Exponent = 111...1, Fraction = 000...0
±Infinity
Can be used in subsequent calculations,
avoiding need for overflow check
Exponent = 111...1, Fraction ≠ 000...0
Not-a-Number (NaN)
Indicates illegal or undefined result
e.g., 0.0 / 0.0
Can be used in subsequent calculations
Chapter 3 — Arithmetic for Computers — 35
Denormal Numbers
Exponent = 000...0 hidden bit is 0
Smaller than normal numbers
for gradual underflow, with diminishing precision
The smallest single precision de-normalized number
is:
De-normal with fraction = 000...0
X = (-1)S × (0 + 0) × 2-126 = ± 0.0
Two representations of 0.0
Chapter 3 — Arithmetic for Computers — 36
Floating-Point Summary
Single Precision Double Precision Meaning
Exponent Significant Exponent Significant
0 0 0 0 0
0 Non-zero 0 Non-zero +/- de-normalized number
1-254 Anything 1-2046 Anything +/- floating-point number
255 0 2047 0 +/- infinity
255 Non-zero 2047 Non-zero NaN (Not a number)
The smallest positive single precision normalized number is:
The smallest single precision de-normalized number is:
or
Floating-Point Addition
Consider a 4-digit decimal example
9.999 × 101 + 1.610 × 10–1
1. Align decimal points
Shift number with smaller exponent
9.999 × 101 + 0.016 × 101
2. Add significands
9.999 × 101 + 0.016 × 101 = 10.015 × 101
3. Normalize result & check for over/underflow
1.0015 × 102
4. Round and renormalize if necessary
1.002 × 102 (Already fits in 4 bits, so no change)
Chapter 3 — Arithmetic for Computers — 38
Floating-Point Addition
Now consider a 4-digit binary example
1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)
1. Align binary points
Shift number with smaller exponent
1.0002 × 2–1 + –0.1112 × 2–1
2. Add significands
1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
3. Normalize result & check for over/underflow
1.0002 × 2–4, with no over/underflow
4. Round and renormalize if necessary
1.0002 × 2–4 (no change) = 0.0625
Chapter 3 — Arithmetic for Computers — 39
FP Adder Hardware
Much more complex than integer adder
Doing it in one clock cycle would take too
long
Much longer than integer operations
Slower clock would penalize all instructions
FP adder usually takes several cycles
Can be pipelined
Chapter 3 — Arithmetic for Computers — 40
FP Adder Hardware
Mux2 Mux3
Step 1
Mux1
Step 2
Mux1: larger exp
Mux2: Fraction with
smaller exp Step 3
Mux3: Fraction with
larger exp Step 4
Chapter 3 — Arithmetic for Computers — 41
Floating-Point Multiplication
Consider a 4-digit decimal example
1.110 × 1010 × 9.200 × 10–5
1. Add exponents
For biased exponents, subtract bias from sum
New exponent = 10 + –5 = 5
2. Multiply significands
1.110 × 9.200 = 10.212 10.212 × 105
3. Normalize result & check for over/underflow
1.0212 × 106
4. Round and renormalize if necessary
1.021 × 106
5. Determine sign of result from signs of operands
+1.021 × 106
Chapter 3 — Arithmetic for Computers — 42
Floating-Point Multiplication
Now consider a 4-digit binary example
1.0002 × 2–1 × –1.1102 × 2–2 (0.5 × –0.4375)
1. Add exponents
– 127
Unbiased: –1 + –2 = –3
Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127
2. Multiply significands
1.0002 × 1.1102 = 1.1102 1.1102 × 2–3
3. Normalize result & check for over/underflow
1.1102 × 2–3 (no change) with no over/underflow
4. Round and renormalize if necessary
1.1102 × 2–3 (no change)
5. Determine sign: +ve × –ve –ve
–1.1102 × 2–3 = –0.21875
Chapter 3 — Arithmetic for Computers — 43
FP Arithmetic Hardware
FP multiplier is of similar complexity to FP
adder
But uses a multiplier for significands instead of
an adder
FP arithmetic hardware usually does
Addition, subtraction, multiplication, division,
reciprocal, square-root
FP integer conversion
Operations usually takes several cycles
Can be pipelined
Chapter 3 — Arithmetic for Computers — 44
FP Instructions in MIPS
FP hardware is coprocessor 1
Adjunct processor that extends the ISA
Separate FP registers
32 single-precision: $f0, $f1, … $f31
Paired for double-precision: $f0/$f1, $f2/$f3, …
Release 2 of MIPs ISA supports 32 × 64-bit FP reg’s
FP instructions operate only on FP registers
Programs generally don’t do integer ops on FP data,
or vice versa
More registers with minimal code-size impact
FP load and store instructions
lwc1, ldc1, swc1, sdc1
e.g., ldc1 $f8, 32($sp)
Chapter 3 — Arithmetic for Computers — 45
FP Instructions in MIPS
Single-precision arithmetic
add.s, sub.s, mul.s, div.s
e.g., add.s $f0, $f1, $f6
Double-precision arithmetic
add.d, sub.d, mul.d, div.d
e.g., mul.d $f4, $f4, $f6
Single- and double-precision comparison
c.xx.s, c.xx.d (xx is eq, lt, le, …)
Sets or clears FP condition-code bit
e.g. c.lt.s $f3, $f4
Branch on FP condition code true or false
bc1t, bc1f
e.g., bc1t TargetLabel
Chapter 3 — Arithmetic for Computers — 46
FP Example: Array Multiplication
X=X+Y×Z
All 32 × 32 matrices, 64-bit double-precision elements
C code:
void mm (double x[][],
double y[][], double z[][]) {
int i, j, k;
for (i = 0; i! = 32; i = i + 1)
for (j = 0; j! = 32; j = j + 1)
for (k = 0; k! = 32; k = k + 1)
x[i][j] = x[i][j]
+ y[i][k] * z[k][j];
}
Addresses of x, y, z in $a0, $a1, $a2, and
i, j, k in $s0, $s1, $s2
Chapter 3 — Arithmetic for Computers — 47
FP Example: Array Multiplication
MIPS code:
li $t1, 32 # $t1 = 32 (row size/loop end)
li $s0, 0 # i = 0; initialize 1st for loop
L1: li $s1, 0 # j = 0; restart 2nd for loop
L2: li $s2, 0 # k = 0; restart 3rd for loop
sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x)
addu $t2, $t2, $s1 # $t2 = i * size(row) + j
sll $t2, $t2, 3 # $t2 = byte offset of [i][j]
addu $t2, $a0, $t2 # $t2 = byte address of x[i][j]
l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j]
L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z)
addu $t0, $t0, $s1 # $t0 = k * size(row) + j
sll $t0, $t0, 3 # $t0 = byte offset of [k][j]
addu $t0, $a2, $t0 # $t0 = byte address of z[k][j]
l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j]
…
Chapter 3 — Arithmetic for Computers — 48
FP Example: Array Multiplication
…
sll $t0, $s0, 5 # $t0 = i*32 (size of row of y)
addu $t0, $t0, $s2 # $t0 = i*size(row) + k
sll $t0, $t0, 3 # $t0 = byte offset of [i][k]
addu $t0, $a1, $t0 # $t0 = byte address of y[i][k]
l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k]
mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j]
add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j]
addiu $s2, $s2, 1 # $k k + 1
bne $s2, $t1, L3 # if (k != 32) go to L3
s.d $f4, 0($t2) # x[i][j] = $f4
addiu $s1, $s1, 1 # $j = j + 1
bne $s1, $t1, L2 # if (j != 32) go to L2
addiu $s0, $s0, 1 # $i = i + 1
bne $s0, $t1, L1 # if (i != 32) go to L1
Chapter 3 — Arithmetic for Computers — 49
Example: Rounding with Guard Digits
Interpretation of Data
The BIG Picture
Bits have no inherent meaning
Interpretation depends on the instructions
applied
Computer representations of numbers
Finite range and precision
Need to account for this in programs
Chapter 3 — Arithmetic for Computers — 52
§3.6 Parallelism and Computer Arithmetic: Associativity
Associativity
Parallel programs may interleave
operations in unexpected orders
Assumptions of associativity may fail
(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38 0.00E+00
z 1.0 1.0 1.50E+38
1.00E+00 0.00E+00
Need to validate parallel programs under
varying degrees of parallelism
Chapter 3 — Arithmetic for Computers — 53
§3.8 Fallacies and Pitfalls
Right Shift and Division
Left shift by i places multiplies an integer
by 2i
Right shift divides by 2i?
Only for unsigned integers
For signed integers
Arithmetic right shift: replicate the sign bit
e.g., –5 / 4
111110112 >> 2 = 111111102 = –2
Rounds toward –∞
c.f. 111110112 >>> 2 = 001111102 = +62
Chapter 3 — Arithmetic for Computers — 54
Who Cares About FP Accuracy?
Important for scientific code
But for everyday consumer use?
“My bank balance is out by 0.0002¢!”
The Intel Pentium FDIV bug
The market expects accuracy
See Colwell, The Pentium Chronicles
Chapter 3 — Arithmetic for Computers — 55
§3.9 Concluding Remarks
Concluding Remarks
ISAs support arithmetic
Signed and unsigned integers
Floating-point approximation to reals
Bounded range and precision
Operations can overflow and underflow
MIPS ISA
Core instructions: 54 most frequently used
Other instructions: less frequent
Chapter 3 — Arithmetic for Computers — 56