Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views36 pages

Lec01 1 Introduction

ECE 454 is a Computer Systems Programming course that covers system programming, software/hardware interactions, and performance optimization. The course includes labs, assignments, and tests, with a focus on parallelization and memory management. Communication is conducted through official UoT email, and academic integrity is emphasized with strict policies against cheating.

Uploaded by

chenyuhe0304
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views36 pages

Lec01 1 Introduction

ECE 454 is a Computer Systems Programming course that covers system programming, software/hardware interactions, and performance optimization. The course includes labs, assignments, and tests, with a focus on parallelization and memory management. Communication is conducted through official UoT email, and academic integrity is emphasized with strict policies against cheating.

Uploaded by

chenyuhe0304
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

ECE 454

Computer Systems Programming


Introduction

1
Introduction

• Course Outline: Description, Subjects, Requirements, Evaluation,


Schedule, Important Dates

• Course Overview

2
Recommended Textbook

• Textbook is not essential


• The relevant contents will be covered in the slides
• The links for some online resources will be posted

• Textbook:
Computer Systems: A Programmer's Perspective
Authors: Randal E. Bryant and David R. O'Hallaron
Publisher: Prentice Hall, 3rd Edition, 2015

3
Communication

• All email communications in this course MUST be done using the


official UoT email accounts

• Add the course code (ECE454) first at the subject line

• Quercus is the Learning Management System

4
Evaluation

• Participation* 5%
• Labs (5 Labs, covering assignments)
• Assignments: 40%
• Tests 1&2: 20% (Oct18 – 10%, Nov15 – 10%)
• Final Exam: 35% (TBD)
* Find details in the course outline

5
Labs

• You need to work on all Lab Assignments in a Team of 2


More details in the Assignments’ guidelines
• Lab submission
• Electronic submission only through Quercus
• Follow the instructions defined in Lab guideline

6
Note!

• In case of Cheating, the mark is 0 and official letter is in the file


• What is cheating?
• Using someone else’s solution to finish your assignment
• Sharing code with others
• What is NOT cheating?
• Helping others use systems or tools
• Helping others with high-level design issues
• We do use cheater-beaters
• Automatically compares your solutions with others

7
Questions?

8
Course Objectives (1)

• System Programming
• Most engineering jobs involve System Programming
• System Programmers are increasingly in demand
• A system programmer is worth 1000x normal – Bill Gates

9
Course Objectives (2)

• Get better understanding of software/hardware interactions


• Important whether you are a software or hardware oriented
• Considering a programming job or grad school
• Computing is at the heart of many interesting projects today

10
Start a Company in your 20’s!

11
Image

Founders of Successful Tech


Companies Are Mostly Middle-Aged

Tony Fadell started Nest in 2010, after leading the engineering


team that created the iPod and playing a crucial role in the
development of the iPhone. Like many entrepreneurs, he was
then over 40. NYtimes, Aug 29, 2019
12
Objectives in Programming(1)
• Readability

• Debugability
Productivity
• Reliability (choice of language, programming practices)

• Maintainability

• Scalability Performance
(systems understanding)
• Efficiency
ECE 454

13
Objectives in Programming(2)
• Suppose you’re building
• The “homepage” feature

void display_homepage (user) {


friendlist = get_friendlist (user);
foreach (friend in friendlist) {
update = get_update_status (friend);
display (update);
}
}

How can I double the speed of this program?


Easy: TAKE ECE 454!!! 14
Multicores - Present and Future
 2x cores every 1-2yrs: 1000 cores by 2020!?

C C C C C C C

P P P P
PentiumIV Core2 Duo Core 2 Quad

P P C C C C
P P
C C C C C C C C

C C C C C C C C
P P
P P C C C C

8-core 16-core 15
Only One Sequential Program to Run?
void display_homepage (user) {
friendlist = get_friendlist (user);
Time

foreach (friend in friendlist) {


update = get_update_status (friend);
display (update);
}
}
C C C C
P P
C C C C C C

P C C C C
P P
2-core C C C C

16-core
one core idle 15 cores idle! 16
Improving Execution Time
Single Program:

Exec.
Time 

C
C C C C

need parallel threads to reduce execution time

17
void display_homepage (user) {
friendlist = get_friendlist (user);
foreach (friend in friendlist) {
pthread_create(fetch_and_display, friend);
}
}

void fetch_and_display (friend) {


update = get_update_status (friend);
display (update);
}

fetch_and fetch_and fetch_and fetch_and


_display _display _display _display

C C C C

18
Punch line: We Must
Parallelize All Software!

 You will learn it in ECE 454

19
But…
• So far we only discussed CPU

• But is it true that faster CPU always implies faster program?


• The same program may run slower on a faster CPU. Why?

void display_homepage (user) {


friendlist = get_friendlist (user);
foreach (friend in friendlist) {
update = get_update_status (friend);
display (update);
}
}

20
Storage Hierarchy
• Your program needs to access data. That takes time!

21
Numbers Everyone Should Know
• L1 cache reference 0.5 ns (L1 cache size: < 10 KB)
• Branch misprediction 5 ns
• L2 cache reference 7 ns (L2 cache size: hundreds KB)
• Mutex lock/unlock 100 ns
• Main memory reference 100 ns (mem size: GBs)
• Send 2K bytes over 1 Gbps network 20,000 ns
• Read 1 MB sequentially from memory 250,000 ns
• Round trip within same datacenter 500,000 ns
• Flash drive read 40,000 ns
• Disk seek 10,000,000 ns (10 milliseconds)
• Read 1 MB sequentially from network 10,000,000 ns
• Read 1 MB sequentially from disk 30,000,000 ns
• Send packet Cal.->Netherlands->Cal. 150,000,000 ns
Data from Jeff Dean
• *1 ns = 1/1,000,000,000 second
• For a 2 GHz CPU, 1 cycle = 0.5 ns 22
Performance Optimization is About
Finding the bottleneck
• If you can avoid unnecessary disk I/O
• Your program can run 100,000 times faster
• Have you heard of Facebook’s memcached?

• If you allocate your memory in a smart way


• Your data can fit entirely in cache
• Your program can be another 100 times faster
• You will learn this in lab assignments

23
Back to the Facebook Example
void display_homepage (user) {
friendlist = get_friendlist (user);
foreach (friend in friendlist) {
pthread_create(fetch_and_display, friend);
}
}

void fetch_and_display (friend) {


update = get_update_status (friend);
display (update);
}

Challenge: the data is too large!


100 Petabytes = 100,000 x my laptop
24
Back to the Facebook Example
void display_homepage (user) {
friendlist = get_friendlist (user);
updates = MULTI_GET (“updates”, friendlist);
display (updates);
}
MULTI_GET
Opt 1: parallelization +
distribution
server server server server
Opt. 2: Store in
memory memory memory memory
memory instead
of hard disk
FriendA FriendB FriendC

25
Course Content

26
Course Breakdown
• Module 1: Code measurement and optimization

• Module 2: Memory management and optimization

• Module 3A: Multi-core parallelization

• Module 3B: Multi-machine parallelization

27
1) Code Measurement and Optimization
• Topics
• Finding the bottleneck!
• Code optimization principles
• Measuring time on a computer and profiling
• Understanding and using an optimizing compiler

• Assignments
• Lab1: Compiler optimization and program profiling
• Basic performance profiling, finding the bottleneck

28
2) Memory Management and Opt.
• Topics
• Memory hierarchy
• Caches and locality
• Virtual memory
• Note: all involve aspects of software, hardware, and OS

• Assignments
• Lab2: Optimizing memory performance
• Profiling, measurement, locality enhancements for cache performance
• Lab3: Writing your own memory allocator package
• Understanding dynamic memory allocation (malloc)

29
3) Parallelization
• Topics
• A: Parallel/multicore architectures (high-level understanding)
• Threads and threaded programming
• Synchronization and performance
• B: Parallelization on multiple machines
• Big data & cloud computing

• Assignments
• Lab4: Threads and synchronization methods
• Understanding synchronization and performance
• Lab5: (Parallelizing a game simulation program)
• Parallelizing and optimizing a program for multicore performance

30
The Big Picture
Topic 1: code C C
Core
optimization
Cache Cache Cache Topic 3A: multi-
Topic 2: mem. core parallelization
management Memory
Memory

Topic 3B: parallelization


using the cloud

31
The Bigger Picture
• Optimization is not the ONLY goal!
1) Readability

2) Debugability
More important than performance!!!!
3) Reliability

4) Maintainability

5) Scalability Premature optimization is the root of all evil!

6) Efficiency – Donald Knuth

32
Example 1
• Premature optimization causing bugs
• cp /proc/cpuinfo .
• Created an empty file!!! (Demo)

bool copy_reg (.. ) {


if (src.st_size != 0) { Premature optimization!!!
/* Copy the file content */
}
else {
/* skip the copy if the file size = 0 */
}
}

33
Example 2
• Optimization might reduce readability

int count (unsigned x) { int count (unsigned x) {


int sum = x; int sum, i;
while (x != 0) { sum = x;
x = x >> 1; for (i = 1; i < 31; i++) {
sum = sum – x; x = rotatel(x, 1);
} sum = sum + x;
return sum; }
} return -sum;
}

They both count the number of ‘1’ bits in ‘x’.


How will someone else maintain this code?

34
But how do I know if my optimization is
“premature”?
• Hard to answer…

• “Make it work; Make it right; Make it Fast” --- Butler


Lampson

• Purpose of my program?
• E.g., will it have long lifetime or it’s one time (e.g.,
hackathon or ACM programming contest)

• Am I optimizing for the bottleneck?


• E.g., if the program is doing a lot of I/O, there is no point to
optimize for “count the number of bits in an integer”

35
But how do I know if my optimization is
“premature”?
• Am I optimizing for the common case or special case?
• E.g., the “cp” bug was optimizing for a special case…

• What’s the price I pay?


• E.g., reduced readability, increase program size, etc.

36

You might also like