HARDWARE
MULTITHREADING
JAHANGIR ABBAS 15091519-091
SAAD MATEEN 15091519-098
SHAFAQAT ALI 15091519-137
What is Thread?
A thread is a flow of execution through the
process code, with its own program counter
that keeps track of which instruction to
execute next, system registers which hold its
current working variables, and a stack which
contains the execution history.
A thread is also called a lightweight
process.
Types of Thread
Threads are implemented in following two ways −
•User Level Threads − User managed threads.
•Kernel Level Threads − Operating System managed
threads acting on kernel, an operating system core.
Multithreading
In computer architecture, multithreading is
the ability of a central processing
unit (CPU) (or a single core in a multi-core
processor) to provide multiple threads of
execution concurrently, supported by
the operating system.
• What are the differences between software
multithreading and hardware multithreading?
Software: OS support for several concurrent threads
Large number of threads (effectively unlimited)
‘Heavy’ context switching
Hardware: CPU support for several instructions flows
Limited number of threads (typically 2 or 4)
‘Light’/’Immediate’ context switching
MULTITHREADING
TYPES
Coarse-grain
multithreading
Fine-grain
multithreading
Simultaneous
Multi-Threading
Coarse-grain Multithreading
• Threads are switched upon ‘expensive’ operations
• Single thread runs until a costly stall
– E.g. 2nd level cache miss
• Another thread starts during stall for first
– Pipeline fill time requires several cycles!
• Does not cover short stalls
• Less likely to slow execution of a single thread (smaller latency)
• Needs hardware support
– PC and register file for each thread
• – little other hardware
Fine-grain Multithreading
• Threads are switched every single cycle among the ‘ready’
threads
• Two or more threads interleave instructions
– Round-robin fashion
– Skip stalled threads
• Needs hardware support
– Separate PC and register file for each thread
– Hardware to control alternating pattern
• Naturally hides delays
– Data hazards, Cache misses
– Pipeline runs with rare stalls
• Does not make full use of multi-issue architecture
Simultaneous Multi-Threading
• The main idea is to exploit instructions level
parallelism and thread level parallelism at the
same time
• In a superscalar processor issue instructions from
different threads in the same cycle
– Schedule as many ‘ready’ instructions as possible
– Operand reading and result saving becomes
much more complex
Simultaneous Multi-Threading
• Let’s look simply at instruction issue:
1 2 3 4 5 6 7 8 9 10
Inst a IF ID EX MEM WB
Inst b IF ID EX MEM WB
Inst M IF ID EX MEM WB
Inst N IF ID EX MEM WB
Inst c IF ID EX MEM WB
Inst P IF ID EX MEM WB
Inst Q IF ID EX MEM WB
Inst d IF ID EX MEM WB
Inst e IF ID EX MEM WB
Inst R IF ID EX MEM WB
We want to run these
two Threads
Thread A Thread B SMT Issue as many
Time ————>
1 a 1 a Ready instrs.
2 b 2 b as possible
ICM c c d
ICM d e f
3 e 3 4
4 f 5 6
5 ICM … …
6 ICM
… …
SMT ISSUES WITH IN-ORDER PROCESSORS
• Asymmetric pipeline stall
• One part of pipeline stalls – we want other pipeline to
continue
• Overtaking – non-stalled threads should progress
• What happens if a ready thread
SMT issues with in-order processors
Cache misses – Abort instruction (and instructions
in the shadow if Dcache miss) upon cache miss
Most existing implementations are for O-o-O,
register-renamed architectures (akin to
tomasulo)
e.g. PowerPC, Intel Hyper-threading
SIMULTANEOUS MULTI THREADING
• Extracts the most parallelism from instructions and threads
• Implemented mostly in out-of-order processors because they
are the only able to exploit that much parallelism
• Has a significant hardware overhead
• Replicate (and MUX) thread state (registers, TLBs, etc)
• Operand reading and result saving increases datapath
complexity
• Per-thread instruction handling/scheduling engine in out-of-
order implementations
BENEFITS OF HW MT
• Multithreading techniques improve the utilisation of
processor resources and, hence, the overall performance
• If the different threads are accessing the same input data
they may be using the same regions of memory
• Cache efficiency improves in these cases
DISADVANTAGES OF HW MT
• Single-thread performance may be degraded when compared to
a single-thread CPU
• Multiple threads interfere with each other
• Shared caches mean that, effectively, threads would use a fraction
of the whole cache
• Trashing may exacerbate this issue
• Thread scheduling at hardware level adds high complexity to
processor design
• Thread state, managing priorities, OS-level information, …
Some Advanced Uses of Multithreading
SPECULATIVE EXECUTION
• When reaching a conditional branch we could spawn 2
threads
• One runs the true path
• Another runs the false
• Once we know which one is correct
kill the other thread
• Effects of Control Hazards alleviated
• Supported by current OoO cpus
• But not as a full-fledged thread
• Can reach several levels of nested conditions
• Requires memory support (e.g. reordering buffers)
MEMORY PREFETCHING
• Compile applications into two threads
• One runs the whole application Single Original
threaded thread
Scout
thread
• The other thread (scout thread) only has the memory accesses xCM
xCM
xCH
• The scout thread runs ahead and fetches memory in advance
xCM
xCM
xCH
• Ensures data will be in the cache when the original thread needs it xCM
xCH
• cache hit rate increases
• Synchronization is needed x
CM
• Scout has to run ahead enough so that memory delay is hidden …
• But not too much so that it does not replace useful data from the cache
• Beware trashing!!!
SLIPSTREAMING
• Compile sequential applications into two threads
• One runs the application itself Single Original Slipstream
threaded thread thread
• The slipstream thread only has a critical path of the
Non-critical
application
Critical
• The slipstream thread runs ahead and passes results
• Delay of slow operations (e.g. float point division) is
improved
• Synchronization and communication among the threads is
needed
• Requires extra hardware to deal with this ‘special’ behaviour
• Could be used in multicore as well
MULTITHREADING SUMMARY
• A cost-effective way of finding additional parallelism for the CPU
pipeline
• Available in x86, Itanium, Power and SPARC
• Intel Hyper-threading (SMT)
• PowerPC uses SMT
• Ultra-Sparc T1/T2 used fine-grain, later models used SMT
• Sparc64 VI used coarse-grain, later models moved to SMT
• Present additional hardware thread as an additional virtual CPU to
Operating System
• Multiprocessor OS is required
THANK YOU
Any Questions?