Multithreading,
Superscalar,
Intel's HT
Contents…
2
Using ILP support to exploit
thread –level parallelism
performance and efficiency in
advanced multiple issue
processors
Threads
3
A thread is a basic unit of CPU utilization.
A thread is a separate process with its own instructions and data.
A thread may represent a process that is part of a parallel program
consisting of multiple processes, or it may represent an
independent program.
Threads
4
It comprises of a thread ID, a program counter, a register set and a
stack.
It shares its code section, data section, and other operating-system
resources, such as open files and signals with other threads
belonging to the same Process.
A traditional process has a single thread of control. If a process has
multiple threads of control, it can perform more than one task at a time.
Threads
5
Many software packages that run
on modern desktop PCs are
multithreaded.
For example:
A word processor may have:
a thread for displaying graphics,
another thread for responding to
keystrokes from the user, and
a third thread for performing spelling
and grammar checking in the
background.
Threads
6
Threads also play a vital role in remote procedure call (RPC)
systems.
RPCs allows interprocess communication by providing a
communication mechanism similar to ordinary function or procedure
calls.
Many operating system kernels are multithreaded; several threads
operate in the kernel, and each thread performs a specific task, such as
managing devices or interrupt handling.
Multithreading
7
Benefits:
1. Responsiveness: Multithreading is an interactive application that
may allow a program to continue running even if part of it is
blocked, thereby increasing responsiveness to the user.
For example: A multithreaded web browser could still allow user
interaction in one thread while an image was being loaded in another
thread.
2. Resource sharing: By default, threads share the memory and the
resources of the process to which they belong. The benefit of sharing
code and data is that it allows an application to have several different
threads of activity within the same address space.
Multithreading
8
Benefits:
3. Economy: Allocating memory and resources for process creation is
costly. Since threads share resources of the process to which they
belong, they will provide cost effective solution.
4. Utilization of multiprocessor architectures: In multiprocessor
architecture, threads may be running in parallel on different processors.
A single threaded process can only run on one CPU, no matter how
many are available.
Multithreading on a multi-CPU machine increases concurrency.
Multithreading Models
9
Support for threads may be provided either at the user level or at
the kernel level.
User threads are supported above the kernel and are managed
without kernel support, whereas kernel threads are supported and
managed directly by the operating system.
Multithreading Models
10
Many-to-One Model:
The many-to-one model maps many user-
level threads to one kernel thread.
Thread management is done by the
thread library in user space, so it is
efficient.
Only one thread can access the kernel at
a time, hence multiple threads are unable to
run in parallel on multiprocessors.
Multithreading Models
11
One-to-One Model:
The one-to-one model maps each user
thread to a kernel thread.
It provides more concurrency than the many-
to-one model. It allows multiple threads to run in
parallel on multiprocessors.
The only drawback to this model is that
creating a user thread requires creating the
corresponding kernel thread.
The overhead of creating kernel threads can
burden the performance of an application.
Multithreading Models
12
Many-to-Many Model :
The many-to-many model multiplexes many
user-level threads to a smaller or equal
number of kernel threads.
The number of kernel threads may be specific
to either a particular application or a particular
machine.
Developers can create as many user threads
as necessary, and the corresponding kernel
threads can run in parallel on a
multiprocessor.
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
14
Although ILP increases the performance of system; then also ILP
can be quite limited or hard to exploit in some applications.
Furthermore, there may be parallelism occurring naturally at a higher
level in the application.
For example:
An online transaction-processing system has parallelism among the
multiple queries and updates. These queries and updates can be
processed mostly in parallel, since they are largely independent of one
another.
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
15
This higher-level parallelism is called thread-level parallelism (TLP)
because it is logically structured as separate threads of execution.
ILP is parallel operations within a loop or straight-line code.
TLP is represented by the use of multiple threads of execution that
are in parallel.
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
16
Thread-level parallelism is an important alternative to instruction-
level parallelism.
In many applications thread-level parallelism occurs naturally (many
server applications).
If software is written from scratch, then expressing the parallelism
is much easy.
But if established applications written without parallelism in mind,
then there can be significant challenges and can be extremely costly
to rewrite them to exploit thread-level parallelism.
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
19
There are two main approaches to multithreading.
Fine-grained multithreading &
Coarse-grained multithreading
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
20
Fine-grained multithreading:
It switches between threads on each instruction, causing the
execution of multiple threads to be interleaved.
This interleaving is often done in a round-robin fashion.
To make fine-grained multithreading practical, the CPU must be
able to switch threads on every clock cycle.
Multithreading: ILP Support to Exploit
Thread-Level Parallelism
21
Coarse-grained multithreading:
It was invented as an alternative to fine-grained multithreading.
Coarse-grained multithreading switches threads only on costly
(larger) interrupt/process.
This change relieves the need to have thread switching.
The main difference between fine grained and coarse grained
multithreading is that, in fine grained multithreading, the threads issue
instructions in round-robin manner while in coarse grained
multithreading, the threads issue instructions until a stall occurs.
SCALAR PROCESSOR
Scalar processors is classified as SISD
processor(single instruction,single data) .A scalar
processor processes only one datum at a time.
In a scalar organization, a single pipelined functional
unit exists for:
• Integer operations;
• And one for floating-point operations (FLOPS);
Functional unit:
• Part of the CPU responsible for calculations
SUPERSCALAR PROCESSOR
A superscalar processor is a CPU that implements
a form of parallelism called instruction-level
parallelism within a single processor.
A superscalar CPU can execute more than one
instruction per clock cycle. Because processing
speeds are measured in clock cycles per second
(megahertz), a superscalar processor will be faster
than a scalar processor
Ability to execute instructions in different
pipelines:
• Independently and concurrently;
PIPELINE PROBLEMS:
Pipeline concept already introduced some
problems.
A resource hazard exists when the pipeline
required for an instruction is unavailable due to a
previous instruction.
Data hazards:
occur when the pipeline changes the order of
read/write accesses to operands so that the order
differs from the order seen by sequentially
executing instructions on the unpipelined machine.
execution scenario
o:
Simultaneous multithreading(SMT)
•Mix of superscalar and multithreading
technique
•All hardware contexts are active leading to
competition
•Issue multiple instructions from multiple
threads
•Both TLP and ILP comes into play
•Multiple slots for different threads are
filled
•Resource organization
•Resource sharing