Hyper Threading
Concepts & Architecture
Hyperthreading – Concepts & Architecture
Agenda
Technical Journey to Hyperthreading
Hardware & Software Requirements
Performance Issues
Hyperthreading
Timeline
Now, Parallel Computing is available on Single Processor
Symmetric
Cluster Multi Processor
Processing
(SMP)
Hyperthreading
Parallel Computing - Goals
Parallel computing is when a program
uses concurrency to either:
decrease the runtime for the solution to a
problem.
Increase the size of the problem that can
be solved.
Hyperthreading
Single-threaded Processor
Parts of Processor:
Front-end:
fetching/decoding/reordering
Execution core:
Concurrency
actual execution Illusion
Multiple programs in memory
Only one executes at a time
4-issue CPU with bubbles
7-unit CPU with pipeline bubbles
Time-slicing via context switching
Hyperthreading
Single-threaded SMP
What is SMP?
“Symmetric Multi-
Processors”
Tolerably, mislabeled as
“Shared-Memory
Processors”
Processors all connected
to a (large) memory
UMA: Uniform Memory
Access, makes is easy to
program
Symmetric: all memory is
Two threads execute at once, so threads equally close to all
processors
spend less time waiting
Cache Coherence via
Twice as much speed and twice as much “snoopy caches”
waste
Hyperthreading
Super-threading
[Time-Slice Multithreading]
Principle: the processor can execute
more than one thread at a time
Requires more hardware cleverness
logic switches at each cycle
Leads to less Waste
Just a finer grain of interleaving
BUT, each stage of the front end or the
execution core only runs instructions
from ONE thread!
Does not help with poor instruction
parallelism within one thread
Hyperthreading
Simultaneous Multi Threading (SMT)
Principle: the processor can execute more
than one thread at a time, even within a
single clock cycle!!
Requires even more hardware cleverness
logic switches within each cycle
Finest level of interleaving
From the OS perspective, there are two
“logical” processors
Hyperthreading
Evolution of Hyper-Threading
Two ways of faster computing
Increase Clock Speed
Better utilization of resources
Clock Speed cannot be increased beyond certain limit
Lot of heat generation
Better utilization of resources is now the choice
Memory access takes relatively more time
During this interval, CPU resources can be used by other threads
This requires – Out-Of-Order Execution, Register Re-naming,...
Hyperthreading
Hyper Threading
With these points in mind, Intel came up
with its version of Simultaneous Multi
Threading (SMT) called Hyper Threading
(HT)
Hyperthreading
Hardware Requirements
Because the additional threads all run on the same CPU elements
(FPU, ALU) the only additions that are needed are the initial
scheduling process.
Although hyper-threading might seem like a pretty large departure
from the kind of conventional, process-switching multithreading done
on a single-threaded CPU, it actually doesn't add too much
complexity to the hardware.
Intel reports that adding hyper-threading to their Xeon processor
added only 5% to its die area.
Hyperthreading
Intel Xeon – Case Study
Capable of executing at most two threads in parallel on two logical
processors.
Must be able to maintain information for two distinct and independent
thread contexts.
Done by dividing up the processor's micro-architectural resources into
three types:
replicated
partitioned
shared
Hyperthreading
Intel Xeon – Resources Division
• Register renaming logic
Instruction Pointer
Replicated
•
• ITLB
• Return stack predictor
• Various other architectural registers
Re-order buffers (ROBs)
Partitioned
•
• Load/Store buffers
• Various queues, like the scheduling queues, uop queue, etc.
Caches: trace cache, L1, L2, L3
Shared
•
• Micro-architectural registers
• Execution Units
Hyperthreading
Replicated Resources
Some resources have to be replicated like
Instruction Pointer
1 Instruction Pointer for each Logical Processor.
Xeon: 2 Instruction Pointer
Register Allocation Table
For mapping architectural registers (8 integers and 8 floating-point) onto
128 General Purpose Registers and 128 Floating Point Registers
Replicated Resource managing a Shared Resource
Hyperthreading
Partitioned Resources
Queues are partitioned resources
Statically Partitioned Queue Dynamically Partitioned Queue
Hyperthreading
Shared Resources
Heart of Hyperthreading:
More Shared Resources => More Efficient Hyperthreading <=
squeezing maximum amount of computing power out of the
minimum amount of die space
Such resources are: registers, load/store units
SMT unaware
Hyperthreading
Hyper-Threading Architecture
Overview
Hyperthreading
Hyperthreading
Confusing Notions
Is Hyper-threaded Processor same as Dual
Core?
Answer: NO
Hyper-Threaded = 2 Logical Processors
Dual-Core = 2 Actual Processors on single chip
Hyperthreading
HT – System Requirements
HT enabled Processor
Pentium 4 3.06 GHz, Xeon
HT enabled Chipsets
Intel 945G Express
HT enabled System BIOS
HT enabled Operating System
Windows 2000, XP, Linux 2.4.12
Hyperthreading
HT – Requirements from User
Enable HT in BIOS
To utilize HT
Use multi-threaded applications
OR
Run multiple applications at same time
Hyperthreading
Performance Issues - 1
2 Logical Processors != Double Power
Lesser CPU intensive programs may not
show much any gain
Reported gains are 20-40%
Hyperthreading
Performance Issues - 2
Death-Traps
Main cause: Shared Resource
Xeon Philosophy <->Cooperative Multitasking OS
Cases:
Floating Point Unit (FPU):
One floating-point intensive thread takes up the FPU; Another similar thread
contending for same FPU gets stalled
Cache
No cache-coherency problem as in SMP
But, cache conflict between two logical processors
Worst-Case: Two threads accessing different parts of memory and sharing no data =>
Lot of thrashing
Benchmarks Results: Non-SMT may perform better
With the wrong mix of code, hyper-threading decreases performance
Hyperthreading
HT Hardware Hands-On
Need to Enable Hyperthreading through BIOS
Simple Test:
Do together with and without HT
Compress 1GB File
Play Windows Media Player with Visualization plug-in
Analyze the time taken in 2 cases
Good Benchmark: Embarrassing Parallel (EP) from NASA
Hyperthreading
4 Processors View in Task Manager
Hyperthreading
Key Point
Hyper-Threading Technology gives better
utilization of processor resources
Hyper-Threading Technology gives more
computing power for multithreaded
applications
Thread Level Parallelism on single
processor
Hyperthreading
References
"Hyper-Threading Technology." Intel.
Deborah T. Marr, Frank Binns, David L. Hill, Glenn Hinton, David A.
Koufaty, J. Alan Miller, Michael Upton. "Hyper-Threading
Technology Architecture and Microarchitecture." Intel
Susan Eggers, Hank Levy, Steve Gribble. Simultaneous
Multithreading Project. University of Washington
Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm,
and Dean Tullsen. "Simultaneous Multithreading: A Platform for
Next-generation Processors." IEEE Micro, September/October
1997, pages 12-18.
Jack Lo, Susan Eggers, Joel Emer, Henry Levy, Rebecca Stamm,
and Dean Tullsen. "Converting Thread-Level Parallelism Into
Instruction-Level Parallelism via Simultaneous Multithreading."
ACM Transactions on Computer Systems, August 1997, pages
322-354.
Hyperthreading
Thank You
E-mail: [email protected]
You have downloaded this presentation from:
http://www.zainvi.tophonors.com
Hyperthreading