ECE 454
Computer Systems Programming
Introduction
1
Introduction
• Course Outline: Description, Subjects, Requirements, Evaluation,
Schedule, Important Dates
• Course Overview
2
Recommended Textbook
• Textbook is not essential
• The relevant contents will be covered in the slides
• The links for some online resources will be posted
• Textbook:
Computer Systems: A Programmer's Perspective
Authors: Randal E. Bryant and David R. O'Hallaron
Publisher: Prentice Hall, 3rd Edition, 2015
3
Communication
• All email communications in this course MUST be done using the
official UoT email accounts
• Add the course code (ECE454) first at the subject line
• Quercus is the Learning Management System
4
Evaluation
• Participation* 5%
• Labs (5 Labs, covering assignments)
• Assignments: 40%
• Tests 1&2: 20% (Oct18 – 10%, Nov15 – 10%)
• Final Exam: 35% (TBD)
* Find details in the course outline
5
Labs
• You need to work on all Lab Assignments in a Team of 2
More details in the Assignments’ guidelines
• Lab submission
• Electronic submission only through Quercus
• Follow the instructions defined in Lab guideline
6
Note!
• In case of Cheating, the mark is 0 and official letter is in the file
• What is cheating?
• Using someone else’s solution to finish your assignment
• Sharing code with others
• What is NOT cheating?
• Helping others use systems or tools
• Helping others with high-level design issues
• We do use cheater-beaters
• Automatically compares your solutions with others
7
Questions?
8
Course Objectives (1)
• System Programming
• Most engineering jobs involve System Programming
• System Programmers are increasingly in demand
• A system programmer is worth 1000x normal – Bill Gates
9
Course Objectives (2)
• Get better understanding of software/hardware interactions
• Important whether you are a software or hardware oriented
• Considering a programming job or grad school
• Computing is at the heart of many interesting projects today
10
Start a Company in your 20’s!
11
Image
Founders of Successful Tech
Companies Are Mostly Middle-Aged
Tony Fadell started Nest in 2010, after leading the engineering
team that created the iPod and playing a crucial role in the
development of the iPhone. Like many entrepreneurs, he was
then over 40. NYtimes, Aug 29, 2019
12
Objectives in Programming(1)
• Readability
• Debugability
Productivity
• Reliability (choice of language, programming practices)
• Maintainability
• Scalability Performance
(systems understanding)
• Efficiency
ECE 454
13
Objectives in Programming(2)
• Suppose you’re building
• The “homepage” feature
void display_homepage (user) {
friendlist = get_friendlist (user);
foreach (friend in friendlist) {
update = get_update_status (friend);
display (update);
}
}
How can I double the speed of this program?
Easy: TAKE ECE 454!!! 14
Multicores - Present and Future
2x cores every 1-2yrs: 1000 cores by 2020!?
C C C C C C C
P P P P
PentiumIV Core2 Duo Core 2 Quad
P P C C C C
P P
C C C C C C C C
C C C C C C C C
P P
P P C C C C
8-core 16-core 15
Only One Sequential Program to Run?
void display_homepage (user) {
friendlist = get_friendlist (user);
Time
foreach (friend in friendlist) {
update = get_update_status (friend);
display (update);
}
}
C C C C
P P
C C C C C C
P C C C C
P P
2-core C C C C
16-core
one core idle 15 cores idle! 16
Improving Execution Time
Single Program:
Exec.
Time
C
C C C C
need parallel threads to reduce execution time
17
void display_homepage (user) {
friendlist = get_friendlist (user);
foreach (friend in friendlist) {
pthread_create(fetch_and_display, friend);
}
}
void fetch_and_display (friend) {
update = get_update_status (friend);
display (update);
}
fetch_and fetch_and fetch_and fetch_and
_display _display _display _display
C C C C
18
Punch line: We Must
Parallelize All Software!
You will learn it in ECE 454
19
But…
• So far we only discussed CPU
• But is it true that faster CPU always implies faster program?
• The same program may run slower on a faster CPU. Why?
void display_homepage (user) {
friendlist = get_friendlist (user);
foreach (friend in friendlist) {
update = get_update_status (friend);
display (update);
}
}
20
Storage Hierarchy
• Your program needs to access data. That takes time!
21
Numbers Everyone Should Know
• L1 cache reference 0.5 ns (L1 cache size: < 10 KB)
• Branch misprediction 5 ns
• L2 cache reference 7 ns (L2 cache size: hundreds KB)
• Mutex lock/unlock 100 ns
• Main memory reference 100 ns (mem size: GBs)
• Send 2K bytes over 1 Gbps network 20,000 ns
• Read 1 MB sequentially from memory 250,000 ns
• Round trip within same datacenter 500,000 ns
• Flash drive read 40,000 ns
• Disk seek 10,000,000 ns (10 milliseconds)
• Read 1 MB sequentially from network 10,000,000 ns
• Read 1 MB sequentially from disk 30,000,000 ns
• Send packet Cal.->Netherlands->Cal. 150,000,000 ns
Data from Jeff Dean
• *1 ns = 1/1,000,000,000 second
• For a 2 GHz CPU, 1 cycle = 0.5 ns 22
Performance Optimization is About
Finding the bottleneck
• If you can avoid unnecessary disk I/O
• Your program can run 100,000 times faster
• Have you heard of Facebook’s memcached?
• If you allocate your memory in a smart way
• Your data can fit entirely in cache
• Your program can be another 100 times faster
• You will learn this in lab assignments
23
Back to the Facebook Example
void display_homepage (user) {
friendlist = get_friendlist (user);
foreach (friend in friendlist) {
pthread_create(fetch_and_display, friend);
}
}
void fetch_and_display (friend) {
update = get_update_status (friend);
display (update);
}
Challenge: the data is too large!
100 Petabytes = 100,000 x my laptop
24
Back to the Facebook Example
void display_homepage (user) {
friendlist = get_friendlist (user);
updates = MULTI_GET (“updates”, friendlist);
display (updates);
}
MULTI_GET
Opt 1: parallelization +
distribution
server server server server
Opt. 2: Store in
memory memory memory memory
memory instead
of hard disk
FriendA FriendB FriendC
25
Course Content
26
Course Breakdown
• Module 1: Code measurement and optimization
• Module 2: Memory management and optimization
• Module 3A: Multi-core parallelization
• Module 3B: Multi-machine parallelization
27
1) Code Measurement and Optimization
• Topics
• Finding the bottleneck!
• Code optimization principles
• Measuring time on a computer and profiling
• Understanding and using an optimizing compiler
• Assignments
• Lab1: Compiler optimization and program profiling
• Basic performance profiling, finding the bottleneck
28
2) Memory Management and Opt.
• Topics
• Memory hierarchy
• Caches and locality
• Virtual memory
• Note: all involve aspects of software, hardware, and OS
• Assignments
• Lab2: Optimizing memory performance
• Profiling, measurement, locality enhancements for cache performance
• Lab3: Writing your own memory allocator package
• Understanding dynamic memory allocation (malloc)
29
3) Parallelization
• Topics
• A: Parallel/multicore architectures (high-level understanding)
• Threads and threaded programming
• Synchronization and performance
• B: Parallelization on multiple machines
• Big data & cloud computing
• Assignments
• Lab4: Threads and synchronization methods
• Understanding synchronization and performance
• Lab5: (Parallelizing a game simulation program)
• Parallelizing and optimizing a program for multicore performance
30
The Big Picture
Topic 1: code C C
Core
optimization
Cache Cache Cache Topic 3A: multi-
Topic 2: mem. core parallelization
management Memory
Memory
Topic 3B: parallelization
using the cloud
31
The Bigger Picture
• Optimization is not the ONLY goal!
1) Readability
2) Debugability
More important than performance!!!!
3) Reliability
4) Maintainability
5) Scalability Premature optimization is the root of all evil!
6) Efficiency – Donald Knuth
32
Example 1
• Premature optimization causing bugs
• cp /proc/cpuinfo .
• Created an empty file!!! (Demo)
bool copy_reg (.. ) {
if (src.st_size != 0) { Premature optimization!!!
/* Copy the file content */
}
else {
/* skip the copy if the file size = 0 */
}
}
33
Example 2
• Optimization might reduce readability
int count (unsigned x) { int count (unsigned x) {
int sum = x; int sum, i;
while (x != 0) { sum = x;
x = x >> 1; for (i = 1; i < 31; i++) {
sum = sum – x; x = rotatel(x, 1);
} sum = sum + x;
return sum; }
} return -sum;
}
They both count the number of ‘1’ bits in ‘x’.
How will someone else maintain this code?
34
But how do I know if my optimization is
“premature”?
• Hard to answer…
• “Make it work; Make it right; Make it Fast” --- Butler
Lampson
• Purpose of my program?
• E.g., will it have long lifetime or it’s one time (e.g.,
hackathon or ACM programming contest)
• Am I optimizing for the bottleneck?
• E.g., if the program is doing a lot of I/O, there is no point to
optimize for “count the number of bits in an integer”
35
But how do I know if my optimization is
“premature”?
• Am I optimizing for the common case or special case?
• E.g., the “cp” bug was optimizing for a special case…
• What’s the price I pay?
• E.g., reduced readability, increase program size, etc.
36