Computer Technology
and Instructions
Sam Amiri
The Computer Revolution
Progress in computer technology
Underpinned by Moore’s Law
Makes novel applications feasible
Computers in automobiles
Cell phones
Human genome project
World Wide Web
Search Engines
Computers are pervasive
2
Classes of Computers
Desktop computers
General purpose, variety of software
Subject to cost/performance tradeoff
Server computers
Network based
High capacity, performance, reliability
Range from small servers to building sized
Embedded computers
Hidden as components of systems
Stringent power/performance/cost constraints
3
The Processor Market
4
Understanding Performance
Algorithm
Determines number of operations executed
Programming language, compiler, architecture
Determine number of machine instructions executed per operation
Processor and memory system
Determine how fast instructions are executed
I/O system (including OS)
Determines how fast I/O operations are executed
5
Below Your Program
Application software
Written in high-level language
System software
Compiler: translates HLL code to machine code
Operating System: service code
Handling input/output
Managing memory and storage
Scheduling tasks & sharing resources
Hardware
Processor, memory, I/O controllers
6
Levels of Program Code
High-level language
Level of abstraction closer to problem domain
Provides for productivity and portability
Assembly language
Textual representation of instructions
Hardware representation
Binary digits (bits)
Encoded instructions and data
7
Components of a Computer
Same components for all kinds
of computer
Desktop, server, embedded
Input/output includes
User-interface devices
Display, keyboard, mouse
Storage devices
Hard disk, CD/DVD, flash
Network adapters
For communicating with other computers
8
Inside the Processor (CPU)
Datapath: performs operations on data
Control: sequences datapath, memory, ...
Cache memory
Small fast SRAM memory for immediate access to data
9
Abstractions
Abstraction helps us deal with complexity
Hide lower-level detail
Instruction set architecture (ISA)
The hardware/software interface
Application binary interface (ABI)
The ISA plus system software interface
Implementation
The details underlying and interface
10
Defining Performance
Response time
How long it takes to do a task
Throughput
Total work done per unit time
e.g., tasks/transactions/… per hour
How are response time and throughput affected by
Replacing the processor with a faster version?
Adding more processors?
We’ll focus on response time for now…
11
Measuring Execution Time
Elapsed time
Total response time, including all aspects
Processing, I/O, OS overhead, idle time
Determines system performance
CPU time
Time spent processing a given job
Discounts I/O time, other jobs’ shares
Comprises user CPU time and system CPU time
Different programs are affected differently by CPU and system performance
12
CPU Clocking
Operation of digital hardware governed by a constant-rate clock
Clock period: duration of a clock cycle
e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×109Hz
13
CPU Time
CPU Time = CPU Clock Cycles × Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
Performance improved by
Reducing number of clock cycles
Increasing clock rate
Hardware designer must often trade off clock rate against cycle count
14
Instruction Count and CPI
Clock Cycles = Instruction Count × Cycles per Instruction
CPU Time = Instruction Count × CPI × Clock Cycle Time
Instruction Count × CPI
=
Clock Rate
Instruction Count for a program
Determined by program, ISA and compiler
Average cycles per instruction
Determined by CPU hardware
If different instructions have different CPI
Average CPI affected by instruction mix
15
CPI in More Detail
If different instruction classes take different numbers of cycles
n
Clock Cycles = ∑ (CPIi × Instruction Count i )
i=1
Weighted average CPI
Clock Cycles n
Instruction Count i
CPI = = ∑ CPIi ×
Instruction Count i=1 Instruction Count
Relative frequency
16
CPI Example
Alternative compiled code sequences using instructions in classes A,
B, C
Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
17
Performance Summary
Instructions Clock cycles Seconds
CPU Time = × ×
Program Instruction Clock cycle
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
18
Power Trends
The power wall
We can’t reduce voltage
further
We can’t remove more
heat
How else can we
improve performance?
In CMOS IC technology
Power = Capacitive load × Voltage 2 × Frequency
×30 5V → 1V ×1000 19
Uniprocessor Performance
Constrained by power, instruction-level parallelism, memory latency
20
Multiprocessors
Multicore microprocessors
More than one processor per chip
Requires explicitly parallel programming
Compare with instruction level parallelism
Hardware executes multiple instructions at once
Hidden from the programmer
Hard to do
Programming for performance
Load balancing
Optimizing communication and synchronization
21
Instruction Set
The repertoire of instructions of a computer
Different computers have different instruction sets
But with many aspects in common
Early computers had very simple instruction sets
Simplified implementation
Many modern computers also have simple instruction sets
22
The MIPS Instruction Set
Used as the example throughout this module
Stanford MIPS commercialized by MIPS Technologies
(www.mips.com)
Large share of embedded core market
Applications in consumer electronics, network/storage equipment, cameras,
printers, …
Typical of many modern ISAs
23
Arithmetic Operations
Add and subtract, three operands
Two sources and one destination
add a, b, c # a gets b + c
All arithmetic operations have this form
Design Principle 1: Simplicity favours regularity
Regularity makes implementation simpler
Simplicity enables higher performance at lower cost
24
Arithmetic Example
C code:
f = (g + h) - (i + j);
Compiled MIPS code:
add t0, g, h # temp t0 = g + h
add t1, i, j # temp t1 = i + j
sub f, t0, t1 # f = t0 - t1
25
Register Operands
Arithmetic instructions use register operands
MIPS has a 32 × 32-bit register file
Use for frequently accessed data
Numbered 0 to 31
32-bit data called a “word”
Assembler names
$t0, $t1, …, $t9 for temporary values
$s0, $s1, …, $s7 for saved variables
Design Principle 2: Smaller is faster
c.f. main memory: millions of locations
26
Register Operand Example
C code:
f = (g + h) - (i + j);
f, …, j in $s0, …, $s4
Compiled MIPS code:
add $t0, $s1, $s2
add $t1, $s3, $s4
sub $s0, $t0, $t1
27
Memory Operands
Main memory used for composite data
Arrays, structures, dynamic data
To apply arithmetic operations
Load values from memory into registers
Store result from register to memory
Memory is byte addressed
Each address identifies an 8-bit byte
Words are aligned in memory
Address must be a multiple of 4
MIPS is Big Endian
Most-significant byte at least address of a word
c.f. Little Endian: least-significant byte at least address
28
Memory Operand Example 1
C code:
g = h + A[8];
g in $s1, h in $s2, base address of A in $s3
Compiled MIPS code:
Index 8 requires offset of 32
4 bytes per word
lw $t0, 32($s3) # load word
add $s1, $s2, $t0
offset base register 29
Memory Operand Example 2
C code:
A[12] = h + A[8];
h in $s2, base address of A in $s3
Compiled MIPS code:
Index 8 requires offset of 32
lw $t0, 32($s3) # load word
add $t0, $s2, $t0
sw $t0, 48($s3) # store word
30
Registers vs. Memory
Registers are faster to access than memory
Operating on memory data requires loads and stores
More instructions to be executed
Compiler must use registers for variables as much as possible
Only spill to memory for less frequently used variables
Register optimization is important!
31
Immediate Operands
Constant data specified in an instruction
addi $s3, $s3, 4
No subtract immediate instruction
Just use a negative constant
addi $s2, $s1, -1
Design Principle 3: Make the common case fast
Small constants are common
Immediate operand avoids a load instruction
32
The Constant Zero
MIPS register 0 ($zero) is the constant 0
Cannot be overwritten
Useful for common operations
E.g., move between registers
add $t2, $s1, $zero
33
MIPS R-format Instructions
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Instruction fields
op: operation code (opcode)
rs: first source register number
rt: second source register number
rd: destination register number
shamt: shift amount (00000 for now)
funct: function code (extends opcode)
34
R-format Example
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
add $t0, $s1, $s2
special $s1 $s2 $t0 0 add
0 17 18 8 0 32
000000 10001 10010 01000 00000 100000
000000100011001001000000001000002 = 0232402016 35
MIPS I-format Instructions
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
Immediate arithmetic and load/store instructions
rt: destination or source register number
Constant: –215 to +215 – 1
Address: offset added to base address in rs
Design Principle 4: Good design demands good compromises
Different formats complicate decoding, but allow 32-bit instructions uniformly
Keep formats as similar as possible
36
Logical Operations
Instructions for bitwise manipulation
Useful for extracting and inserting groups of bits in a word
37
Shift Operations
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
shamt: how many positions to shift
Shift left logical
Shift left and fill with 0 bits
sll by i bits multiplies by 2i
Shift right logical
Shift right and fill with 0 bits
srl by i bits divides by 2i (unsigned only)
38
NOT Operations
Useful to invert bits in a word
Change 0 to 1, and 1 to 0
MIPS has NOR 3-operand instruction
a NOR b == NOT ( a OR b )
Register 0: always read as
zero
nor $t0, $t1, $zero
$t1 0000 0000 0000 0000 0011 1100 0000 0000
$t0 1111 1111 1111 1111 1100 0011 1111 1111
39
Conditional Operations
Branch to a labeled instruction if a condition is true
Otherwise, continue sequentially
beq rs, rt, L1
if (rs == rt) branch to instruction labeled L1;
bne rs, rt, L1
if (rs != rt) branch to instruction labeled L1;
j L1
unconditional jump to instruction labeled L1
40
Compiling If Statements
C code:
if (i==j) f = g+h;
else f = g-h;
f, g, … in $s0, $s1, …
Compiled MIPS code:
bne $s3, $s4, Else
add $s0, $s1, $s2
j Exit
Else: sub $s0, $s1, $s2
Exit: …
Assembler calculates addresses
41
More Conditional Operations
Set result to 1 if a condition is true
Otherwise, set to 0
slt rd, rs, rt
if (rs < rt) rd = 1; else rd = 0;
slti rt, rs, constant
if (rs < constant) rt = 1; else rt = 0;
Use in combination with beq, bne
slt $t0, $s1, $s2 # if ($s1 < $s2)
bne $t0, $zero, L # branch to L
42
Branch Addressing
Branch instructions specify
Opcode, two registers, target address
Most branch targets are near branch
Forward or backward
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
PC-relative addressing
Target address = PC + offset × 4
PC already incremented by 4 by this time
43
Jump Addressing
Jump (j and jal) targets could be anywhere in text segment
Encode full address in instruction
op address
6 bits 26 bits
(Pseudo)Direct jump addressing
Target address = PC31…28 : (address × 4)
44
Addressing Mode Summary
45
Thank You!
46