The Computer Revolution
▪ Progress in computer technology
▪ Underpinned by Moore’s Law
▪ Makes novel applications feasible
▪ Computers in automobiles
▪ Cell phones
▪ Human genome project
▪ World Wide Web
▪ Search Engines
▪ Computers are pervasive
15-Aug-21 Faculty of Computer Science and Engineering 2
History of Computer Development
▪ First generation 1945 - 1955
▪ vacuum tubes, plug boards
▪ Second generation 1955 - 1965
▪ transistors, batch systems
▪ Third generation 1965 – 1980
▪ ICs and multiprogramming
▪ Fourth generation 1980 – present
▪ personal computers (Desk, Lap)
▪ SuperComp.,
▪ DataCenter, Clusters, etc.
15-Aug-21 Faculty of Computer Science and Engineering 3
The Moore’s Law
15-Aug-21 Faculty of Computer Science and Engineering 4
The History: at the very beginning
ENIAC, 1943, 30 tons, 200KW, ~1000 ops/sec
15-Aug-21 Faculty of Computer Science and Engineering 5
The History: Now
Typical 2021 laptop
~1kg, 10W, 10 billion ops/sec
15-Aug-21 Faculty of Computer Science and Engineering 6
Classes of Computers
Source: internet
15-Aug-21 Faculty of Computer Science and Engineering 7
Classes of Computers
▪ Personal computers
▪ General purpose, variety of software
▪ Subject to cost/performance trade-off
▪ Embedded computers
▪ Hidden as components of systems
▪ Stringent power/performance/cost constraints
15-Aug-21 Faculty of Computer Science and Engineering 8
Classes of Computers
▪ Server computers
▪ Network based
▪ High capacity, performance, reliability
▪ Range from small servers to building sized
▪ Supercomputers
▪ High end scientific and engineering calculations
▪ Highest capability but represent a small fraction of
the overall computer market
15-Aug-21 Faculty of Computer Science and Engineering 9
The PostPC Era has arrived
▪ Your next computer is not a computer (apple)
15-Aug-21 Faculty of Computer Science and Engineering Source: IDC 10
The PostPC Era
▪ Cloud computing
▪ Warehouse Scale Computers (WSC)
▪ Software as a Service (SaaS)
▪ Portion of software run on a PMD and a portion run in the Cloud
▪ Amazon and Google
▪ Personal Mobile Device (PMD)
▪ Battery operated
▪ Connects to the Internet
▪ Hundreds of dollars
▪ Smart phones, tablets, electronic glasses
15-Aug-21 Faculty of Computer Science and Engineering 11
Understanding Performance
▪ Algorithm
▪ Determines number of operations executed
▪ Programming language, compiler, architecture
▪ Determine number of machine instructions
executed per operation
▪ Processor and memory system
▪ Determine how fast instructions are executed
▪ I/O system (including OS)
▪ Determines how fast I/O operations are executed
15-Aug-21 Faculty of Computer Science and Engineering 13
Below Your Program
▪ Application software
▪ Written in high-level language
▪ System software
▪ Compiler: translates HLL code to
machine code
Hardware ▪ Operating System: service code
▪ Handling input/output
▪ Managing memory and storage
▪ Scheduling tasks & sharing
resources
▪ Hardware
▪ Processor, memory, I/O controllers
15-Aug-21 Faculty of Computer Science and Engineering 14
swap(int v[], int k){
int temp;
Levels of Program Code
temp = v[k];
v[k] = v[k+1];
High-level v[k+1] = temp;
Language }
Compiler
▪ High-level language program
(in C)
▪ Level of abstraction closer to problem swap: multi $2, $5, 4
domain add $2, $4, $2
lw $15, 0($2)
▪ Provides for productivity and Assembly
lw $16, 4($2)
portability Language
Program sw $16, 0($2)
▪ Assembly language (for MIPS) sw
jr
$15, 4($2)
$31
▪ Textual representation of instructions
Assembler
▪ Hardware representation
▪ Binary digits (bits) Binary
00000000101000100000000100011000
00000000100000100001000000100001
▪ Encoded instructions and data Machine 10001101111000100000000000000000
▪ Which layer represents for Language 10001110000100100000000000000100
(for MIPS) 10101110000100100000000000000000
program.exe/.asm/.c? 10101101111000100000000000000100
00000011111000000000000000001000
15-Aug-21 Faculty of Computer Science and Engineering 15
Components of a Computer
▪ Same components for
all kinds of computer
▪ Desktop, server, embedded
▪ Input/output includes
▪ User-interface devices
▪ Display, keyboard, mouse
▪ Storage devices
▪ Hard disk, CD/DVD, flash
▪ Network adapters
▪ For communicating with
other computers
15-Aug-21 Faculty of Computer Science and Engineering 16
Inside the Processor (CPU)
▪ Datapath: performs operations on data
▪ Control: sequences Datapath, memory, …
▪ Cache memory
▪ Small fast SRAM memory for immediate access
to data
15-Aug-21 Faculty of Computer Science and Engineering 17
Eight Great Ideas
▪ Design for Moore’s Law
▪ Use abstraction to simplify design
▪ Make the common case fast
▪ Performance via parallelism
▪ Performance via pipelining
▪ Performance via prediction
▪ Hierarchy of memories
▪ Dependability via redundancy
15-Aug-21 Faculty of Computer Science and Engineering 18
Opening the Box
Capacitive multitouch LCD screen
3.8 V, 25 Watt-hour battery
Computer board
15-Aug-21 Faculty of Computer Science and Engineering 19
Through the Looking Glass
▪ LCD screen: picture elements (pixels)
▪ Mirrors content of frame buffer memory
15-Aug-21 Faculty of Computer Science and Engineering 20
Touchscreen
▪ PostPC device
▪ Supersedes keyboard and mouse
▪ Resistive and Capacitive types
▪ Most tablets, smart phones use capacitive
▪ Capacitive allows multiple touches
simultaneously
15-Aug-21 Faculty of Computer Science and Engineering 21
Inside the Processor
▪ Apple A14
15-Aug-21 Faculty of Computer Science and Engineering 22
Abstractions
▪ Abstraction helps us deal with complexity
▪ Hide lower-level detail
▪ Instruction set architecture (ISA)
▪ The hardware/software interface
▪ Application binary interface
▪ The ISA plus system software interface
▪ Implementation
▪ The details underlying and interface
15-Aug-21 Faculty of Computer Science and Engineering 23
A Safe Place for Data
▪ Volatile main memory This Photo by Unknown Author is licensed under CC BY
▪ Loses instructions and data when power off
▪ Non-volatile secondary memory
▪ SSD, Magnetic disk
▪ Flash memory
▪ Optical disk (CDROM, DVD) This Photo by Unknown Author is licensed under CC
BY
15-Aug-21 Faculty of Computer Science and Engineering This Photo by Unknown Author is licensed under CC 24
BY-ND
Networks
▪ Communication, resource sharing, nonlocal
access
▪ Local area network (LAN): Ethernet
▪ Wide area network (WAN): the Internet
▪ Wireless network: WiFi, Bluetooth
15-Aug-21 Faculty of Computer Science and Engineering 25
Technology Trends
▪ Electronics technology continues to evolve
▪ Increased capacity and performance
▪ Reduced cost
Year Technology Relative performance/cost
1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000
15-Aug-21 Faculty of Computer Science and Engineering 26
Semiconductor Technology
▪ Silicon: semiconductor
▪ Add materials to transform properties:
▪ Conductors
▪ Insulators
▪ Switch
15-Aug-21 Faculty of Computer Science and Engineering 27
Manufacturing ICs
▪ Yield: proportion of working dies per wafer
15-Aug-21 Faculty of Computer Science and Engineering 28
Intel Core i7 Wafer
▪ 300mm wafer, 280 chips, 32nm technology
▪ Each chip is 20.7 x 10.5 mm
15-Aug-21 Faculty of Computer Science and Engineering 29
Integrated Circuit Cost
▪ Nonlinear relation to area and defect rate
▪ Wafer cost and area are fixed
▪ Defect rate determined by manufacturing process
▪ Die area determined by architecture and circuit design
Cost per wafer
Cost per die =
Dies per wafer Yield
Dies per wafer Wafer area Die area
1
Yield =
(1+ (Defects per area Die area/2)) 2
15-Aug-21 Faculty of Computer Science and Engineering 30
Defining Performance
▪ Which airplane has the best performance?
Boeing 777 375 Boeing 777 4630
Boeing 747 470 Boeing 747 4150
BAC/Sud BAC/Sud
132 4000
Concorde Concorde
Douglas DC- Douglas DC-
146 8720
8-50 8-50
0 200 400 600 0 5000 10000
Passenger Capacity Cruising Range (miles)
Boeing 777 610 Boeing 777 228750
Boeing 747 610 Boeing 747 286700
BAC/Sud BAC/Sud
1350 178200
Concorde Concorde
Douglas DC- Douglas DC-
544 79424
8-50 8-50
0 500 1000 1500 0 200000 400000
Cruising Speed (mph) Passengers x mph
15-Aug-21 Faculty of Computer Science and Engineering 31
Response Time and Throughput
▪ Response time
▪ How long it takes to do a task
▪ Throughput
▪ Total work done per unit time
▪ e.g., tasks/transactions/… per hour
▪ How are response time and throughput affected by
▪ Replacing the processor with a faster version?
▪ Adding more processors?
▪ We’ll focus on response time for now…
15-Aug-21 Faculty of Computer Science and Engineering 32
Relative Performance
▪ Define: Performance = 1/Execution Time
▪ “X is n time faster than Y”
Performance X Performance Y
= Execution time Y Execution time X = n
▪ Example: time taken to run a program
▪ 10s on A, 15s on B
▪ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
▪ So, A is 1.5 times faster than B
15-Aug-21 Faculty of Computer Science and Engineering 33
Measuring Execution Time
▪ Elapsed time
▪ Total response time, including all aspects
▪ Processing, I/O, OS overhead, idle time
▪ Determines system performance
▪ CPU time
▪ Time spent processing a given job
▪ Discounts I/O time, other jobs’ shares
▪ Comprises user CPU time and system CPU time
▪ Different programs are affected differently by CPU and
system performance
15-Aug-21 Faculty of Computer Science and Engineering 34
CPU Clocking
▪ Operation of digital hardware governed by a constant-
rate clock Clock period
Clock (cycles)
Data transfer
and computation
Update state
▪ Clock period: duration of a clock cycle
▪ e.g., 250ps = 0.25ns = 250×10–12s
▪ Clock frequency (rate): cycles per second
▪ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
15-Aug-21 Faculty of Computer Science and Engineering 35
CPU Time
▪ Performance improved by
▪ Reducing number of clock cycles
▪ Increasing clock rate
▪ Hardware designer must often trade off clock
rate against cycle count
CPU Time = CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
15-Aug-21 Faculty of Computer Science and Engineering 36
CPU Time Example
▪ Computer A: 2GHz clock, 10s CPU time
▪ Designing Computer B
▪ Aim for 6s CPU time
▪ Can do faster clock, but causes 1.2 × clock cycles
▪ How fast must Computer B clock be?
Clock CyclesB 1.2 Clock CyclesA
Clock RateB = =
CPU TimeB 6s
Clock CyclesA = CPU Time A Clock Rate A
= 10s 2GHz = 20 109
1.2 20 109 24 109
Clock RateB = = = 4GHz
6s 6s
15-Aug-21 Faculty of Computer Science and Engineering 37
Instruction Count and CPI
▪ Instruction Count for a program
▪ Determined by program, ISA and compiler
▪ Average cycles per instruction
▪ Determined by CPU hardware
▪ If different instructions have different CPI
▪ Average CPI affected by instruction mix
Clock Cycles = Instructio n Count Cycles per Instructio n
CPU Time = Instructio n Count CPI Clock Cycle Time
Instructio n Count CPI
=
Clock Rate
15-Aug-21 Faculty of Computer Science and Engineering 38
CPI Example
▪ Computer A: Cycle Time = 250ps, CPI = 2.0
▪ Computer B: Cycle Time = 500ps, CPI = 1.2
▪ Same ISA
▪ Which is faster, and by how much?
CPU Time = Instructio n Count CPI Cycle Time A is faster…
A A A
= I 2.0 250ps = I 500ps
CPU Time = Instructio n Count CPI Cycle Time
B B B
= I 1.2 500ps = I 600ps
B = I 600ps = 1.2
CPU Time
…by this much
CPU Time I 500ps
A
15-Aug-21 Faculty of Computer Science and Engineering 39
CPI in More Detail
▪ If different instruction classes take different
numbers of cycles n
Clock Cycles = (CPI Instruction Count )
i i
i=1
▪ Weighted average CPI
Clock Cycles n
Instruction Count i
CPI = = CPIi
Instruction Count i=1 Instruction Count
Relative frequency
15-Aug-21 Faculty of Computer Science and Engineering 40
CPI Example
▪ Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
15-Aug-21 Faculty of Computer Science and Engineering 41
Performance Summary
▪ Performance depends on
▪ Algorithm: affects IC, possibly CPI
▪ Programming language: affects IC, CPI
▪ Compiler: affects IC, CPI
▪ Instruction set architecture: affects IC, CPI, Tc
Instructions Clock cycles Seconds
CPU Time =
Program Instruction Clock cycle
15-Aug-21 Faculty of Computer Science and Engineering 42
Power Trends
▪ In CMOS IC technology
Power = Capacitive load Voltage2 Frequency
×30 5V → 1V ×1000
15-Aug-21 Faculty of Computer Science and Engineering 43
Multiprocessors
▪ Multicore microprocessors
▪ More than one processor per chip
▪ Requires explicitly parallel programming
▪ Compare with instruction level parallelism
▪ Hardware executes multiple instructions at once
▪ Hidden from the programmer
▪ Hard to do
▪ Programming for performance
▪ Load balancing
▪ Optimizing communication and synchronization
15-Aug-21 Faculty of Computer Science and Engineering 45
SPEC CPU Benchmark
▪ Programs used to measure performance
▪ Supposedly typical of actual workload
▪ Standard Performance Evaluation Corp (SPEC)
▪ Develops benchmarks for CPU, I/O, Web, …
▪ SPEC CPU2006
▪ Elapsed time to execute a selection of programs
▪ Negligible I/O, so focuses on CPU performance
▪ Normalize relative to reference machine
▪ Summarize as geometric mean of performance ratios
▪ CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i =1
i
15-Aug-21 Faculty of Computer Science and Engineering 46
SPEC Power Benchmark
▪ Power consumption of server at different
workload levels
▪ Performance: ssj_ops/sec
▪ Power: Watts (Joules/sec)
10 10
Overall ssj_ops per Watt = ssj_opsi poweri
i=0 i=0
15-Aug-21 Faculty of Computer Science and Engineering 47
Pitfall: Amdahl’s Law
▪ Improving an aspect of a computer and expecting a
proportional improvement in overall performance
Taffected
Timproved = + Tunaffected
improvement factor
▪ Example: multiply accounts for 80s/100s
▪ How much improvement in multiply performance to
get 5× overall?
80
20 = + 20 ◼ Can’t be done!
n
▪ Corollary: make the common case fast
15-Aug-21 Faculty of Computer Science and Engineering 48
Pitfall: MIPS as a Performance Metric
▪ MIPS: Millions of Instructions Per Second
▪ Doesn’t account for
▪ Differences in ISAs between computers
▪ Differences in complexity between instructions
Instruction count
MIPS =
Execution time 106
Instruction count Clock rate
= =
Instruction count CPI CPI 10 6
10 6
Clock rate
▪ CPI varies between programs on a given CPU
15-Aug-21 Faculty of Computer Science and Engineering 49
Reducing Power
▪ Suppose a new CPU has
▪ 85% of capacitive load of old CPU
▪ 15% voltage and 15% frequency reduction
Pnew Cold 0.85 (Vold 0.85)2 Fold 0.85
= = 0.85 4
= 0.52
Cold Vold Fold
2
Pold
▪ The power wall
▪ We can’t reduce voltage further
▪ We can’t remove more heat
▪ How else can we improve performance?
15-Aug-21 Faculty of Computer Science and Engineering 50
Concluding Remarks
▪ Cost/performance is improving
▪ Due to underlying technology development
▪ Hierarchical layers of abstraction
▪ In both hardware and software
▪ Instruction set architecture
▪ The hardware/software interface
▪ Execution time: the best performance measure
▪ Power is a limiting factor
▪ Use parallelism to improve performance
15-Aug-21 Faculty of Computer Science and Engineering 51
Exercise
▪ Given a program X in detail as below.
Instruction class A B C D
CPI 2 4 3 2.5
# instruction 1000 2000 3000 4000
▪ What is CPU time of X where it is run at 2Ghz ?
▪ Improving performance by reducing #instruction of B by a
half. What is the speed up?
▪ To improve performance by changing only 1 class of
instruction. What is the limitation of speed up?
15-Aug-21 Faculty of Computer Science and Engineering 52