Hardware Multithreading

This document discusses hardware multithreading and its various types. It defines a thread as a flow of execution through a process's code with its own program counter. There are two types of threads: user-level threads managed by users and kernel-level threads managed by the operating system kernel. Multithreading allows a CPU to execute multiple threads concurrently. There are three main types: coarse-grain which switches threads on expensive operations, fine-grain which switches every cycle, and simultaneous multi-threading which exploits instruction-level and thread-level parallelism simultaneously. Multithreading improves processor resource utilization but can degrade single-thread performance and increase complexity.

Uploaded by

mian saad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

180 views22 pages

Hardware Multithreading

Uploaded by

mian saad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

HARDWARE

MULTITHREADING
JAHANGIR ABBAS 15091519-091

SAAD MATEEN 15091519-098

SHAFAQAT ALI 15091519-137

What is Thread?
A thread is a flow of execution through the
process code, with its own program counter
that keeps track of which instruction to
execute next, system registers which hold its
current working variables, and a stack which
contains the execution history.
A thread is also called a lightweight
process.
Types of Thread
Threads are implemented in following two ways −
•User Level Threads − User managed threads.
•Kernel Level Threads − Operating System managed
threads acting on kernel, an operating system core.
Multithreading
In computer architecture, multithreading is
the ability of a central processing
unit (CPU) (or a single core in a multi-core
processor) to provide multiple threads of
execution concurrently, supported by
the operating system.
• What are the differences between software
multithreading and hardware multithreading?
Software: OS support for several concurrent threads
Large number of threads (effectively unlimited)
‘Heavy’ context switching

Hardware: CPU support for several instructions flows

Limited number of threads (typically 2 or 4)
‘Light’/’Immediate’ context switching
MULTITHREADING
TYPES

Coarse-grain
multithreading

Fine-grain
multithreading

Simultaneous
Multi-Threading
Coarse-grain Multithreading
• Threads are switched upon ‘expensive’ operations
• Single thread runs until a costly stall
– E.g. 2nd level cache miss
• Another thread starts during stall for first
– Pipeline fill time requires several cycles!
• Does not cover short stalls
• Less likely to slow execution of a single thread (smaller latency)
• Needs hardware support
– PC and register file for each thread
• – little other hardware
Fine-grain Multithreading
• Threads are switched every single cycle among the ‘ready’
threads
• Two or more threads interleave instructions
– Round-robin fashion
– Skip stalled threads
• Needs hardware support
– Separate PC and register file for each thread
– Hardware to control alternating pattern
• Naturally hides delays
– Data hazards, Cache misses
– Pipeline runs with rare stalls
• Does not make full use of multi-issue architecture
Simultaneous Multi-Threading
• The main idea is to exploit instructions level
parallelism and thread level parallelism at the
same time
• In a superscalar processor issue instructions from
different threads in the same cycle
– Schedule as many ‘ready’ instructions as possible
– Operand reading and result saving becomes
much more complex
Simultaneous Multi-Threading
• Let’s look simply at instruction issue:
1 2 3 4 5 6 7 8 9 10
Inst a IF ID EX MEM WB
Inst b IF ID EX MEM WB
Inst M IF ID EX MEM WB
Inst N IF ID EX MEM WB
Inst c IF ID EX MEM WB
Inst P IF ID EX MEM WB
Inst Q IF ID EX MEM WB
Inst d IF ID EX MEM WB
Inst e IF ID EX MEM WB
Inst R IF ID EX MEM WB
We want to run these
two Threads
Thread A Thread B SMT Issue as many
Time ————>
1 a 1 a Ready instrs.
2 b 2 b as possible
ICM c c d
ICM d e f
3 e 3 4
4 f 5 6
5 ICM … …
6 ICM
… …
SMT ISSUES WITH IN-ORDER PROCESSORS
• Asymmetric pipeline stall
• One part of pipeline stalls – we want other pipeline to
continue
• Overtaking – non-stalled threads should progress
• What happens if a ready thread
SMT issues with in-order processors
Cache misses – Abort instruction (and instructions
in the shadow if Dcache miss) upon cache miss
Most existing implementations are for O-o-O,
register-renamed architectures (akin to
tomasulo)
e.g. PowerPC, Intel Hyper-threading
SIMULTANEOUS MULTI THREADING
• Extracts the most parallelism from instructions and threads
• Implemented mostly in out-of-order processors because they
are the only able to exploit that much parallelism
• Has a significant hardware overhead
• Replicate (and MUX) thread state (registers, TLBs, etc)
• Operand reading and result saving increases datapath
complexity
• Per-thread instruction handling/scheduling engine in out-of-
order implementations
BENEFITS OF HW MT
• Multithreading techniques improve the utilisation of
processor resources and, hence, the overall performance
• If the different threads are accessing the same input data
they may be using the same regions of memory
• Cache efficiency improves in these cases
DISADVANTAGES OF HW MT

• Single-thread performance may be degraded when compared to

a single-thread CPU
• Multiple threads interfere with each other
• Shared caches mean that, effectively, threads would use a fraction
of the whole cache
• Trashing may exacerbate this issue
• Thread scheduling at hardware level adds high complexity to
processor design
• Thread state, managing priorities, OS-level information, …
Some Advanced Uses of Multithreading
SPECULATIVE EXECUTION
• When reaching a conditional branch we could spawn 2
threads
• One runs the true path
• Another runs the false
• Once we know which one is correct
kill the other thread
• Effects of Control Hazards alleviated
• Supported by current OoO cpus
• But not as a full-fledged thread
• Can reach several levels of nested conditions
• Requires memory support (e.g. reordering buffers)
MEMORY PREFETCHING
• Compile applications into two threads
• One runs the whole application Single Original
threaded thread
Scout
thread
• The other thread (scout thread) only has the memory accesses xCM

xCM
xCH

• The scout thread runs ahead and fetches memory in advance

xCM
xCM
xCH
• Ensures data will be in the cache when the original thread needs it xCM
xCH
• cache hit rate increases
• Synchronization is needed x
CM

• Scout has to run ahead enough so that memory delay is hidden …

• But not too much so that it does not replace useful data from the cache
• Beware trashing!!!
SLIPSTREAMING
• Compile sequential applications into two threads
• One runs the application itself Single Original Slipstream
threaded thread thread
• The slipstream thread only has a critical path of the
Non-critical
application
Critical
• The slipstream thread runs ahead and passes results
• Delay of slow operations (e.g. float point division) is
improved
• Synchronization and communication among the threads is
needed
• Requires extra hardware to deal with this ‘special’ behaviour
• Could be used in multicore as well
MULTITHREADING SUMMARY
• A cost-effective way of finding additional parallelism for the CPU
pipeline
• Available in x86, Itanium, Power and SPARC
• Intel Hyper-threading (SMT)
• PowerPC uses SMT
• Ultra-Sparc T1/T2 used fine-grain, later models used SMT
• Sparc64 VI used coarse-grain, later models moved to SMT
• Present additional hardware thread as an additional virtual CPU to
Operating System
• Multiprocessor OS is required
THANK YOU
Any Questions?

SMT and CMP Architectures
100% (3)
SMT and CMP Architectures
19 pages
Embedded C Programming Basics
No ratings yet
Embedded C Programming Basics
16 pages
Sample Eda Lab (Part-A) Manual: Simulation Output
No ratings yet
Sample Eda Lab (Part-A) Manual: Simulation Output
20 pages
Unit Iv Rtos Based Embedded System Design
100% (1)
Unit Iv Rtos Based Embedded System Design
11 pages
Systolic Arrays & Their Applications
No ratings yet
Systolic Arrays & Their Applications
35 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
RTOS Multitasking & Scheduling Guide
No ratings yet
RTOS Multitasking & Scheduling Guide
34 pages
GPU Programming and Parallelism
No ratings yet
GPU Programming and Parallelism
16 pages
Ca Unit 4 Prabu
No ratings yet
Ca Unit 4 Prabu
24 pages
WSN Unit 5
No ratings yet
WSN Unit 5
22 pages
Chapter 2 Lecture 2.3 Control Unit
No ratings yet
Chapter 2 Lecture 2.3 Control Unit
19 pages
ARM7 Processor Architecture
No ratings yet
ARM7 Processor Architecture
33 pages
Unit 2. Network Topology
No ratings yet
Unit 2. Network Topology
7 pages
Debugger Tools
No ratings yet
Debugger Tools
11 pages
The Von Neumann
No ratings yet
The Von Neumann
5 pages
MPMC Lab Manual Exps
No ratings yet
MPMC Lab Manual Exps
29 pages
Unit - 2 Central Processing Unit TOPIC 1: General Register Organization
No ratings yet
Unit - 2 Central Processing Unit TOPIC 1: General Register Organization
13 pages
3 Stage and 5 Stage ARM
No ratings yet
3 Stage and 5 Stage ARM
4 pages
Instruction Pipelining Explained
No ratings yet
Instruction Pipelining Explained
27 pages
Computer Organization CPU Organization 1.3.1
No ratings yet
Computer Organization CPU Organization 1.3.1
13 pages
Interface A Seven Segment Display To An Arduino
No ratings yet
Interface A Seven Segment Display To An Arduino
7 pages
Unit IV MPMC
No ratings yet
Unit IV MPMC
22 pages
PPT-4 - Data Transfer Instructions
No ratings yet
PPT-4 - Data Transfer Instructions
30 pages
Advanced RISC Machine-ARM Notes Bhurchandi
100% (1)
Advanced RISC Machine-ARM Notes Bhurchandi
8 pages
Arm Fundamentals
No ratings yet
Arm Fundamentals
32 pages
Year & Sem.: Iii Yr / Vi Sem Faculty Name: A.Manjunathan Department: Ece Unit No.: Iii Title: Program Design and Analysis
No ratings yet
Year & Sem.: Iii Yr / Vi Sem Faculty Name: A.Manjunathan Department: Ece Unit No.: Iii Title: Program Design and Analysis
90 pages
ESY Unit 1 Notes
No ratings yet
ESY Unit 1 Notes
18 pages
1-IAS Architecture-12-12-2022
No ratings yet
1-IAS Architecture-12-12-2022
34 pages
Benefits of Parallel Computing
No ratings yet
Benefits of Parallel Computing
22 pages
Arm Multiple Choice
33% (9)
Arm Multiple Choice
4 pages
Running Machines: Arm Multiple Choice
No ratings yet
Running Machines: Arm Multiple Choice
4 pages
Unit 4
No ratings yet
Unit 4
108 pages
Real-Time Operating Systems: Read Chapter 6 (David E. Simon, An Embedded Software Primer)
No ratings yet
Real-Time Operating Systems: Read Chapter 6 (David E. Simon, An Embedded Software Primer)
28 pages
Intel 8086 Microprocessor Guide
No ratings yet
Intel 8086 Microprocessor Guide
79 pages
ARM7TDMI Processor
No ratings yet
ARM7TDMI Processor
44 pages
Microprocessor and Interfacing Techniques: (Course Code: CET208A) Credits-3
No ratings yet
Microprocessor and Interfacing Techniques: (Course Code: CET208A) Credits-3
147 pages
Ec6703 Embedded and Real Time Systems
No ratings yet
Ec6703 Embedded and Real Time Systems
1 page
Embedded Firmware Design Basics
100% (2)
Embedded Firmware Design Basics
22 pages
Dual-Port Memory Block Diagram PDF
No ratings yet
Dual-Port Memory Block Diagram PDF
8 pages
Components For Embedded Programs
No ratings yet
Components For Embedded Programs
16 pages
1.evolution and Improvement of ARM Architect
0% (1)
1.evolution and Improvement of ARM Architect
35 pages
ARM INstruction Set
100% (1)
ARM INstruction Set
6 pages
Unit - Ii Process Management, Synchronization and Threads
No ratings yet
Unit - Ii Process Management, Synchronization and Threads
48 pages
Unit-4 8051 Assembly Language Programming Technical
No ratings yet
Unit-4 8051 Assembly Language Programming Technical
59 pages
8051 Microcontroller
No ratings yet
8051 Microcontroller
23 pages
EC8711-Embedded Lab Manual
No ratings yet
EC8711-Embedded Lab Manual
108 pages
Familiarization With Arduino Board and IDE.
No ratings yet
Familiarization With Arduino Board and IDE.
6 pages
RTOS Essentials for Engineers
No ratings yet
RTOS Essentials for Engineers
22 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
37 pages
Unit 5 VxWorks
No ratings yet
Unit 5 VxWorks
5 pages
ARM Memory Organisation
No ratings yet
ARM Memory Organisation
7 pages
Interrupt Service Routine
No ratings yet
Interrupt Service Routine
2 pages
Oracle PL/SQL Exception Handling Guide
100% (1)
Oracle PL/SQL Exception Handling Guide
54 pages
Pipelining: Advanced Computer Architecture
100% (1)
Pipelining: Advanced Computer Architecture
30 pages
Cloud Computing: Assignment-1
No ratings yet
Cloud Computing: Assignment-1
14 pages
Monolithic OS Vs Layered OS
No ratings yet
Monolithic OS Vs Layered OS
5 pages
Multithreading, SMT and CMP
No ratings yet
Multithreading, SMT and CMP
7 pages
Lec 4 Superscalarprocessor PDF
No ratings yet
Lec 4 Superscalarprocessor PDF
23 pages
Lecture19 ILP SMT
No ratings yet
Lecture19 ILP SMT
31 pages
Multithreading: Multithreading Computers Have Hardware Support To Efficiently Execute Multiple
No ratings yet
Multithreading: Multithreading Computers Have Hardware Support To Efficiently Execute Multiple
5 pages
Biz102 Alina Assessment 1.docx 2
No ratings yet
Biz102 Alina Assessment 1.docx 2
5 pages
MIS603 Microservices Architecture Assessment Tittle: (Privacy and Security Report)
No ratings yet
MIS603 Microservices Architecture Assessment Tittle: (Privacy and Security Report)
8 pages
MIS603 - Assessment - 2 - Brief - Report - Module 6.1 - Final - T2B
No ratings yet
MIS603 - Assessment - 2 - Brief - Report - Module 6.1 - Final - T2B
6 pages
Microservices Security Challenges
No ratings yet
Microservices Security Challenges
8 pages
MIS603 - Assessment 1 - Part B - Case Scenario - Final - T2B
No ratings yet
MIS603 - Assessment 1 - Part B - Case Scenario - Final - T2B
1 page
MIS603 - Cheema - Abdul - Case Study Part B
No ratings yet
MIS603 - Cheema - Abdul - Case Study Part B
12 pages
English Presentation
No ratings yet
English Presentation
7 pages
Microservices Architecture Report
No ratings yet
Microservices Architecture Report
6 pages
Image Processing Techniques Code
No ratings yet
Image Processing Techniques Code
8 pages
Assignment 1 (Topologies, Medium)
No ratings yet
Assignment 1 (Topologies, Medium)
13 pages
Information Security Course CS-324
No ratings yet
Information Security Course CS-324
5 pages
Information Secuirty
No ratings yet
Information Secuirty
7 pages
Active Passive Farhan
No ratings yet
Active Passive Farhan
17 pages
Semaphore Basics for Developers
No ratings yet
Semaphore Basics for Developers
10 pages
AmandaNoverdinaP TugasBAB4
No ratings yet
AmandaNoverdinaP TugasBAB4
3 pages
Inter-Process Communication and Synchronization of Processes, Threads and Tasks: Lesson-9: P and V SEMAPHORES
No ratings yet
Inter-Process Communication and Synchronization of Processes, Threads and Tasks: Lesson-9: P and V SEMAPHORES
32 pages
OS Part 02 PDF
No ratings yet
OS Part 02 PDF
93 pages
Unit V Part A: Sri Vidya College of Engineering & Technology - Virudhunagar
No ratings yet
Unit V Part A: Sri Vidya College of Engineering & Technology - Virudhunagar
5 pages
Unit - 4 Computing Technologies: To - Bca 4 Sem BY-Vijayalaxmi Chiniwar
No ratings yet
Unit - 4 Computing Technologies: To - Bca 4 Sem BY-Vijayalaxmi Chiniwar
34 pages
Java Multithreading Interview Prep
No ratings yet
Java Multithreading Interview Prep
10 pages
Tutorial: Operating System MCQS (PART-1)
No ratings yet
Tutorial: Operating System MCQS (PART-1)
115 pages
Технология OpenMP
No ratings yet
Технология OpenMP
3 pages
Operating System Structure.: Monolithic Systems
No ratings yet
Operating System Structure.: Monolithic Systems
27 pages
Little Book of Semaphores
No ratings yet
Little Book of Semaphores
291 pages
CSE-3611 (Operating System) : Segment:-1
No ratings yet
CSE-3611 (Operating System) : Segment:-1
4 pages
Operating System Question Bank Final
No ratings yet
Operating System Question Bank Final
22 pages
Java Multithreading Guide
No ratings yet
Java Multithreading Guide
12 pages
Engineering Journal Accelerating String Matching Algorithms On Multicore Processors
No ratings yet
Engineering Journal Accelerating String Matching Algorithms On Multicore Processors
8 pages
MPI Programming Guide: Second Edition
No ratings yet
MPI Programming Guide: Second Edition
8 pages
Graphics Processing Unit (GPU) Architecture and Programming: TU/e 5kk73 Zhenyu Ye Henk Corporaal 2011-11-15
No ratings yet
Graphics Processing Unit (GPU) Architecture and Programming: TU/e 5kk73 Zhenyu Ye Henk Corporaal 2011-11-15
53 pages
Processes and Threads: 2.1 Life Cycle of A Process
No ratings yet
Processes and Threads: 2.1 Life Cycle of A Process
8 pages
Operating System Chapter 5
No ratings yet
Operating System Chapter 5
47 pages
Inter-Task Communication: Events
No ratings yet
Inter-Task Communication: Events
25 pages
Shortest Time Quantum Scheduling Algorithm
No ratings yet
Shortest Time Quantum Scheduling Algorithm
4 pages
Classification of Parallel Architecture Designs
No ratings yet
Classification of Parallel Architecture Designs
22 pages
RTOS Syllabus
No ratings yet
RTOS Syllabus
3 pages
Forking vs Threading Explained
No ratings yet
Forking vs Threading Explained
8 pages
Distributed Computing Essentials
No ratings yet
Distributed Computing Essentials
100 pages
CSC 504 - Computer Architecture II: Course Particulars
No ratings yet
CSC 504 - Computer Architecture II: Course Particulars
6 pages
Chapter 3: Processes: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
No ratings yet
Chapter 3: Processes: Silberschatz, Galvin and Gagne ©2018 Operating System Concepts - 10 Edition
85 pages
Introduction To Parallel Co...
No ratings yet
Introduction To Parallel Co...
44 pages
Os Notes
No ratings yet
Os Notes
94 pages
Introduction To Parallel Processors: Coa Seminar
No ratings yet
Introduction To Parallel Processors: Coa Seminar
17 pages

Hardware Multithreading

Uploaded by

Hardware Multithreading

Uploaded by

HARDWARE

SAAD MATEEN 15091519-098

SHAFAQAT ALI 15091519-137

Hardware: CPU support for several instructions flows

• Single-thread performance may be degraded when compared to

• The scout thread runs ahead and fetches memory in advance

• Scout has to run ahead enough so that memory delay is hidden …

You might also like