0% found this document useful (0 votes)

65 views8 pages

Dr. Mainak Chaudhuri: Instructor

This document summarizes a course on program optimization for multi-core architectures. It is taught by Dr. Mainak Chaudhuri, Dr. S.K. Aggarwal, and Dr. Rajat Moona from IIT Kanpur's computer science department. The document outlines the course agenda, which includes an overview of the evolution of processor architecture from unpipelined microprocessors to more advanced techniques like out-of-order execution, multiple issue, and the challenges of Moore's Law. It then provides more details on topics like pipelining, hazards, and how techniques like out-of-order execution aim to find and exploit more instruction-level parallelism to improve processor throughput.

Uploaded by

india

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views8 pages

Dr. Mainak Chaudhuri: Instructor

Uploaded by

india

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

NPTEL Online - IIT Kanpur

Instructor:
Dr. Mainak
Chaudhuri

Instructor:
Dr. S. K. Aggarwal

Course Name:

Program Optimization for

Multi-core Architecture

Department:

Computer Science and

Engineering
IIT Kanpur

Instructor:
Dr. Rajat Moona

file:///D|/...audhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/main.html[6/14/2012 11:17:07 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
The Lecture Contains:
Mind-boggling Trends in Chip Industry
Agenda
Unpipelined Microprocessors
Pipelining
Pipelining Hazards
Control Dependence
Data Dependence
Structural Hazard
Out-of-order Execution
Multiple Issue
Out-of-Order Multiple Issue
Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_1.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Mind-boggling Trends in Chip Industry
Long history since 1971
Introduction of Intel 4004
http://www.intel4004.com/
Today we talk about more than one billion transistors on a chip
Intel Montecito (in market since July'06) has 1.7B transistors
Die size has increased steadily (what is a die?)
Intel Prescott: 112mm 2 , Intel Pentium 4EE: 237 mm 2 , Intel Montecito: 596
mm 2
Minimum feature size has shrunk from 10 micron in 1971 to 0.045 micron today

Agenda
Unpipelined microprocessors
Pipelining: simplest form of ILP
Out-of-order execution: more ILP
Multiple issue: drink more ILP
Scaling issues and Moore's Law
Why multi-core
TLP and de-centralized design
Tiled CMP and shared cache
Implications on software
Research directions

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_2.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Unpipelined Microprocessors
Typically an instruction enjoys five phases in its life
Instruction fetch from memory
Instruction decode and operand register read
Execute
Data memory access
Register write
Unpipelined execution would take a long single cycle or multiple short cycles
Only one instruction inside processor at any point in time

Pipelining
One simple observation
Exactly one piece of hardware is active at any point in time
Why not fetch a new instruction every cycle?
Five instructions in five different phases
Throughput increases five times (ideally)
Bottom-line is
If consecutive instructions are independent, they can be processed in parallel
The first form of instruction-level parallelism (ILP)

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_3.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Pipelining Hazards
Instruction dependence limits achievable parallelism
Control and data dependence (aka hazards)
Finite amount of hardware limits achievable parallelism
Structural hazards
Control dependence
On average, every fifth instruction is a branch (coming from if-else, for, do-while,)
Branches execute in the third phase
Introduces bubbles unless you are smart

Control Dependence

What do you fetch in X and Y slots?

Options: Nothing, fall-through, learn past history and predict (today best predictors achieve on
average 97% accuracy for SPEC2000)

Data Dependence

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_4.htm[6/14/2012 11:17:08 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Take three bubbles?
Back-to-back dependence is too frequent
Solution: Hardware bypass paths
Allow the ALU to bypass the produced value in time: not always possible

Data Dependence

Need a live bypass! (requires some negative time travel: not yet feasible in real world)
No option but to take one bubble
Bigger Problems: load latency is often high; you may not find the data in cache

Structural Hazard

Usual solution is to put more resources

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_5.htm[6/14/2012 11:17:09 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Out-of-order Execution

Results must become visible in-order

Multiple Issue

Results must become visible in-order

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_6.htm[6/14/2012 11:17:09 AM]

Objectives_template

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Lecture 1: Evolution of Processor Architecture
Out-of-order Multiple Issue
Some hardware nightmares
Complex issue logic to discover independent instructions
Increased pressure on cache
Impact of a cache miss is much bigger now in terms of lost opportunity
Various speculative techniques are in place to ignore the slow and stupid
memory
Increased impact of control dependence
Must feed the processor with multiple correct instructions every cycle
One cycle of bubble means lost opportunity of multiple instructions
Complex logic to verify

Moore's Law
Number of transistors on-chip doubles every 18 months
So much of innovation was possible only because we had transistors
Phenomenal 58% performance growth every year
Moore's Law is facing a danger today
Power consumption is too high when clocked at multi-GHz frequency and it is
proportional to the number of switching transistors
Wire delay doesn't decrease with transistor size

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_7.htm[6/14/2012 11:17:09 AM]

Ec8392 Digital Electronics Syllabus
100% (1)
Ec8392 Digital Electronics Syllabus
2 pages
C Questions PDF
No ratings yet
C Questions PDF
111 pages
Article - Deploying With FDM PDF
No ratings yet
Article - Deploying With FDM PDF
17 pages
Pdf24 Merged
No ratings yet
Pdf24 Merged
253 pages
Multicore Processor
No ratings yet
Multicore Processor
14 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Multi Core Processors
No ratings yet
Multi Core Processors
30 pages
Multi-core Processors Explained
No ratings yet
Multi-core Processors Explained
18 pages
03 Why Parallel
No ratings yet
03 Why Parallel
34 pages
Computer Architecture
No ratings yet
Computer Architecture
29 pages
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
No ratings yet
ECE 4100/6100 Advanced Computer Architecture: Lecture 13 Multithreading and Multicore Processors
56 pages
Scalable ML for Remote Sensing Data
No ratings yet
Scalable ML for Remote Sensing Data
47 pages
Intro To OpenMP Mattson Customized
No ratings yet
Intro To OpenMP Mattson Customized
94 pages
Note 2
No ratings yet
Note 2
3 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
20BCE2351 Micro Assignment-02
No ratings yet
20BCE2351 Micro Assignment-02
5 pages
Arch13 Multiprocessors Afterlecture
No ratings yet
Arch13 Multiprocessors Afterlecture
70 pages
Parallel Computing Course Guide
100% (1)
Parallel Computing Course Guide
49 pages
Multicore Embeddedfinal Revised
No ratings yet
Multicore Embeddedfinal Revised
9 pages
Parallel Computing Course Guide
No ratings yet
Parallel Computing Course Guide
50 pages
Multicore Architecture
No ratings yet
Multicore Architecture
159 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
CS-3006 2 PDC Overview Compressed
No ratings yet
CS-3006 2 PDC Overview Compressed
107 pages
OLP Notes
No ratings yet
OLP Notes
11 pages
Multicore Programming with OpenMP
No ratings yet
Multicore Programming with OpenMP
124 pages
Multicore Processor
100% (1)
Multicore Processor
23 pages
Padl 1
No ratings yet
Padl 1
81 pages
Intel Core's Multicore Processor
No ratings yet
Intel Core's Multicore Processor
7 pages
Multicore Architecture
No ratings yet
Multicore Architecture
159 pages
Participants
No ratings yet
Participants
8 pages
Processors Powerpoint
No ratings yet
Processors Powerpoint
17 pages
Multithreading Architectures: Computer Science & Artificial Intelligence Lab M.I.T
No ratings yet
Multithreading Architectures: Computer Science & Artificial Intelligence Lab M.I.T
31 pages
l23 Multithread
No ratings yet
l23 Multithread
34 pages
CH18 MultiCoreComputers 18 Slides
No ratings yet
CH18 MultiCoreComputers 18 Slides
18 pages
DigitalLogic ComputerOrganization L23 Multicore Handout
No ratings yet
DigitalLogic ComputerOrganization L23 Multicore Handout
32 pages
EE6304 Lecture12 TLP
No ratings yet
EE6304 Lecture12 TLP
70 pages
Processors: by Nipun Sharma ID: 1411981520
No ratings yet
Processors: by Nipun Sharma ID: 1411981520
24 pages
Multi-Core Processing: Advantages & Challenges
No ratings yet
Multi-Core Processing: Advantages & Challenges
35 pages
Omp Hands On
No ratings yet
Omp Hands On
200 pages
Week 6 - Review On High Performance Energy Efficient Multicore Embedded Computing 1
No ratings yet
Week 6 - Review On High Performance Energy Efficient Multicore Embedded Computing 1
7 pages
Ece 10 - Microprocessor and Microcontroller System and Design (Module 1)
No ratings yet
Ece 10 - Microprocessor and Microcontroller System and Design (Module 1)
20 pages
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
No ratings yet
3-INSTRUCTION LEVEL PARALLELISM-12-Dec-2019Material - I - 12-Dec-2019 - ILP PDF
15 pages
Seminar Report
50% (4)
Seminar Report
30 pages
64-Bit Insider Volume 1 Issue 14
No ratings yet
64-Bit Insider Volume 1 Issue 14
6 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
Computer Organization Basics
No ratings yet
Computer Organization Basics
20 pages
cs152 Notes
No ratings yet
cs152 Notes
34 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
Lecture 36
No ratings yet
Lecture 36
15 pages
Processors Basic
No ratings yet
Processors Basic
159 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
Multicore Processor Overview
No ratings yet
Multicore Processor Overview
19 pages
Lecture1 Introduction To Parallel Computing - 2025
No ratings yet
Lecture1 Introduction To Parallel Computing - 2025
38 pages
Winsem2022-23 Cse4001 Eth Vl2022230503160 Reference Material I 15-12-2022 1.4 Multi-Core Processor
No ratings yet
Winsem2022-23 Cse4001 Eth Vl2022230503160 Reference Material I 15-12-2022 1.4 Multi-Core Processor
34 pages
Unit 5
No ratings yet
Unit 5
86 pages
Cache Nptel
No ratings yet
Cache Nptel
3 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
Workmen Union
No ratings yet
Workmen Union
70 pages
"52-Week High & Stock Returns Analysis"
No ratings yet
"52-Week High & Stock Returns Analysis"
51 pages
Scrip LCP P025 P1 P2 P3 Pfscore Pr025 PR1 PR2 Concor 379.6 - 1 - 1 - 2 - 2 - 6 - 1 3 - 22 Pidilitind 1371.6 0 - 1 - 2 - 2 - 5 0 5 - 12
No ratings yet
Scrip LCP P025 P1 P2 P3 Pfscore Pr025 PR1 PR2 Concor 379.6 - 1 - 1 - 2 - 2 - 6 - 1 3 - 22 Pidilitind 1371.6 0 - 1 - 2 - 2 - 5 0 5 - 12
3 pages
Msci Momentum Indexes Methodology
No ratings yet
Msci Momentum Indexes Methodology
20 pages
Chart 08-10-2020 01-56-44
No ratings yet
Chart 08-10-2020 01-56-44
1 page
Workbook 07-09-2020 20-40-06
No ratings yet
Workbook 07-09-2020 20-40-06
1 page
Workbook 07-09-2020 20-40-06
No ratings yet
Workbook 07-09-2020 20-40-06
1 page
A Dreamer Who Is Too Weak To Face Up To - Song Lyrics
No ratings yet
A Dreamer Who Is Too Weak To Face Up To - Song Lyrics
2 pages
Gann's Square of Nine
100% (1)
Gann's Square of Nine
29 pages
A Cut Above The Rest Lyrics
No ratings yet
A Cut Above The Rest Lyrics
1 page
Intraday Trading with Gann Rules
No ratings yet
Intraday Trading with Gann Rules
4 pages
Result 27-04-2019
No ratings yet
Result 27-04-2019
1 page
Software Products Billing Information: Product Amount
No ratings yet
Software Products Billing Information: Product Amount
2 pages
Result 22-04-2019
No ratings yet
Result 22-04-2019
1 page
EsperTech Technical Datasheet
No ratings yet
EsperTech Technical Datasheet
1 page
RMAN Point in Time Recovery (PITR) Scenario of A Dropped Oracle Tablespace
No ratings yet
RMAN Point in Time Recovery (PITR) Scenario of A Dropped Oracle Tablespace
2 pages
Automation Kiva
No ratings yet
Automation Kiva
25 pages
Race Condition and Avoiding Race Conditions PDF
No ratings yet
Race Condition and Avoiding Race Conditions PDF
4 pages
Basis Data Bengkel
No ratings yet
Basis Data Bengkel
6 pages
COCOMO - An Empirical Estimation Model For Effort
No ratings yet
COCOMO - An Empirical Estimation Model For Effort
4 pages
Syllabus +Blockchain+Developer
No ratings yet
Syllabus +Blockchain+Developer
6 pages
Signal Generator Library: Module User's Guide C24x Foundation Software
No ratings yet
Signal Generator Library: Module User's Guide C24x Foundation Software
121 pages
Cadence Sim
No ratings yet
Cadence Sim
34 pages
Ing, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1984
No ratings yet
Ing, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1984
39 pages
E-Gift Shoppy: E-Commerce Solution
75% (4)
E-Gift Shoppy: E-Commerce Solution
29 pages
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
No ratings yet
Analysis of Algorithms CS 477/677: Hashing Instructor: George Bebis
53 pages
GRC Reporting
No ratings yet
GRC Reporting
8 pages
Chapter 0
No ratings yet
Chapter 0
7 pages
API Testing Method
No ratings yet
API Testing Method
14 pages
UC1 UC4 Task Sheet
No ratings yet
UC1 UC4 Task Sheet
1 page
CDCAS3.0 User Manual
No ratings yet
CDCAS3.0 User Manual
77 pages
Binary Quiz for Students
100% (3)
Binary Quiz for Students
2 pages
WinAC SQL DB Doc V12 en
No ratings yet
WinAC SQL DB Doc V12 en
68 pages
Creating A View in Clearcase
No ratings yet
Creating A View in Clearcase
13 pages
Understanding Slot-Filler Structures
No ratings yet
Understanding Slot-Filler Structures
33 pages
Twido
No ratings yet
Twido
350 pages
Database Lab: Stored Procedures
No ratings yet
Database Lab: Stored Procedures
11 pages
Oracle R12 Install Guide for Linux
No ratings yet
Oracle R12 Install Guide for Linux
34 pages
4D v11 Language R4
No ratings yet
4D v11 Language R4
2,490 pages
Problem Statement Objective Methodology Result Conclusion
No ratings yet
Problem Statement Objective Methodology Result Conclusion
6 pages
App Onboarding Migration Overview
No ratings yet
App Onboarding Migration Overview
1 page

Dr. Mainak Chaudhuri: Instructor

Uploaded by

Dr. Mainak Chaudhuri: Instructor

Uploaded by

NPTEL Online - IIT Kanpur

Program Optimization for

Computer Science and

file:///D|/...audhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/main.html[6/14/2012 11:17:07 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_1.htm[6/14/2012 11:17:08 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_2.htm[6/14/2012 11:17:08 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_3.htm[6/14/2012 11:17:08 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

What do you fetch in X and Y slots?

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_4.htm[6/14/2012 11:17:08 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Usual solution is to put more resources

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_5.htm[6/14/2012 11:17:09 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

Results must become visible in-order

Results must become visible in-order

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_6.htm[6/14/2012 11:17:09 AM]

Module 1: Multi-core: The Ultimate Dose of Moore's Law

file:///D|/...haudhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture1/1_7.htm[6/14/2012 11:17:09 AM]

You might also like