0% found this document useful (0 votes)

6 views51 pages

Module-2 Introduction and Performance Analysis

Uploaded by

Darth Vader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views51 pages

Module-2 Introduction and Performance Analysis

Uploaded by

Darth Vader

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 51

Module-2:

Computer Performance and

Measurement
Performance
• What do you mean by performance of
computer?
– Hardware performance
– When a program runs in less time
– Computer is faster when it completes more
number of task per unit time

2
Performance
Two important metrics
• Response Time or Latency or Execution Time – Time
taken for completion of a single job ( time between the
start and the completion of an event). Smaller is better.
• Throughput – Number of jobs done per unit of time.
Larger is better.

Does one imply the other?

• Yes. Eg. If latency decreases, throughput will increase.
• No. Eg. In pipelining, latency may have be increased to
increase throughput!

3
Comparing Design Alternatives
• “X is n times faster than Y”

• Since execution time is the reciprocal of

performance, the following relationship
holds:

“the throughput of X is 1.3 times higher than Y”

4
Relative Performance
• If computer A runs a program in 10 sec
and Computer B runs the same program in
15 sec, how much faster is A than B ?
• We know that A is n times faster than B if

• Thus the performance ratio is 15/10=1.5

• A is therefore 1.5 times faster than B.

5
Measuring Performance
• Time is not always the metric quoted in
comparing the performance of computers.
• Reliable measure of performance is the
execution time of real programs.
• What is Time?
– wall-clock time
– response time or elapsed time
– disk accesses, memory accesses, input/output activities
– operating system overhead
– CPU time(user CPU time or System CPU time) or I/O
time
6
Measuring Performance
• Computer designers measure that how
fast the hardware can perform basic
functions.
• Computers are constructed using a clock
that runs at a constant rate and
determines when events take place in the
hardware.
• clock cycles, clock period, clock
rate(inverse of clock period)

7
How can we Improve Performance ?

• No. Instructions can be reduced by:

– Better Instruction set architecture (ISA)
– Better Compiler
– Better Algorithm
• Clocks Per Instruction can be reduced by:
– Better Hardware Design
– Make the common case faster
• Clock Rate can be increased by:
– Hardware Design
8
CPU Performance Equation

CPU _ TIME Clock _ Cycles _ Needed * Clock _ Cycle _ Time

No. _ Instructions * Clocks _ Per _ instruction

CPU _ TIME 
Clock _ Rate

What is this Response Time or Throughput??

9
CPU Performance Evaluation: CPI
• Most computers run synchronously utilizing a CPU clock running at a
constant clock rate (Or clock frequency f ) Clock cycle
where: Clock rate = 1 / clock cycle time
cycle 1 cycle 2 cycle 3
f = 1 /C

• The CPU clock rate depends on the specific CPU organization (design) and
hardware implementation technology (VLSI) used.

• A computer machine instruction is comprised of a number of micro operations

which vary in number and complexity depending on the instruction type and the
exact CPU organization (Design).
– A micro operation is an elementary hardware operation that can be
performed during one CPU clock cycle.
– This corresponds to one micro-instruction in micro-programmed CPUs.
Examples: register operations: shift, load, clear, increment, ALU operations:
add , subtract, etc.
• Thus: A single machine instruction may take one or more CPU cycles to
complete and termed as the Cycles Per Instruction (CPI).
• Average (or effective) CPI of a program: The average CPI of all instructions
executed in the program on a given CPU design.
Cycles/sec = Hertz = Hz10
Instructions Per Cycle = IPC = 1/CPI MHz = 106 Hz GHz = 109 Hz
Generic CPU Instruction Processing Steps

Instruction Obtain instruction from program memory

Fetch The Program Counter (PC) points to the instruction to be processed

Instruction
Determine required actions and instruction size
Decode

Operand Locate and obtain operand data

Fetch From data memory or registers

Execute Compute result value or status

Result Deposit results in storage (data memory or

Store register) for later use

Next Determine successor or next instruction

Instruction (i.e Update PC to fetch next instruction to be processed)

11
Computer Performance Measures: Program
Execution Time
• For a specific program compiled to run on a specific machine (CPU)
“A”, has the following parameters:
– I: The total executed instruction count of the program.
– CPI: The average number of cycles per instruction (average CPI).
– C: Clock cycle of machine “A” Or effective CPI

12
So…The Questions are..

– How to compare performance of two different machines?

– What factors affect performance?
– How to improve performance?

• How can one measure the performance of this machine (CPU)

running this program?
– The machine (or CPU) is said to be faster or has better performance
running this program if the total execution time is shorter.

PerformanceA = 1 / Execution TimeA

Programs/second Seconds/program

13
Comparing Computer Performance Using Execution Time
• To compare the performance of two machines (or CPUs) “A”, “B” running
the same program:
PerformanceA = 1 / Execution TimeA
PerformanceB = 1 / Execution TimeB
• Machine A is n times faster than machine B means (or slower if n < 1) :
PerformanceA Execution TimeB
Speedup = n = Performance =
B Execution TimeA

(i.e Speedup is ratio of performance, no units)

• Example:
For a given program:
Execution time on machine A: ExecutionA = 1 second
Execution time on machine B: ExecutionB = 10 seconds
Speedup=
PerformanceA / PerformanceB = Execution TimeB / Execution TimeA
= 10 / 1 = 10

14
CPU Execution Time: The CPU Equation
• A program is comprised of a number of instructions executed I
– Measured in: instructions/program
• The average instruction executed takes a number of cycles per
instruction (CPI) to be completed.
– Measured in: cycles/instruction, CPI
• CPU has a fixed clock cycle time C = 1/clock rate
– Measured in: seconds/cycle

• CPU execution time is the product of the above three

parameters as follows:
CPU
CPUtime
time == Seconds
Seconds ==Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle

T = I x CPI x C
execution Time Number of Average CPI for program CPU Clock Cycle
per program in seconds instructions executed
15
This equation is commonly known as the CPU performance equation
CPU Performance Equation
For a given program executed on a given machine (CPU):
CPI = Total program execution cycles / Instructions count
(i.e average or effective CPI)

CPU clock cycles = Instruction count x CPI

CPU execution time = ?
= CPU clock cycles x Clock cycle Time
= Instruction count x CPI x Clock cycle

This equation is commonly known as the CPU performance equation

16
CPU Execution Time: Example
• A Program is running on a specific machine (CPU) with
the following parameters:
– Total executed instruction count: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
What is the execution time for this program??

CPU
CPUtime
time == Seconds
Seconds ==Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle
CPU time = Instruction count x CPI x Clock cycle Time
= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10 -9
= 0.125 seconds

17
T = I x CPI x C
Aspects of CPU Execution Time
CPU Time = Instruction count executed x CPI x Clock cycle

T = I x CPI x C Depends on:

Program Used
Compiler
ISA

IC
(executed)

Depends on:
Program Used Depends on:
Compiler CPI CCT CPU Organization
ISA Technology (VLSI)
(Average
CPU Organization
CPI)

18
Factors Affecting CPU
Performance
CPU
CPUtime
time == Seconds
Seconds ==Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle

Instruction Cycles per Clock Rate

Count Instruction (1/CCT)
Program

Compiler

Instruction Set
Architecture (ISA)

Organization
(CPU Design)

Technology
(VLSI)

19
T = I x CPI x C
Performance Comparison: Example
• From the previous example: A Program is running on a specific machine
(CPU) with the following parameters:
– Total executed instruction count, IC: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz. Thus: CCT = 1/(200x10 )= 5x10 seconds
6 -9

• Using the same program with these changes:

– A new compiler used: New executed instruction count, I C :
9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHz
• What is the speedup with the changes?

Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )

= .125 / .095 = 1.32
or 32 % faster after changes.

Speedup
Speedup == Old
OldExecution
ExecutionTime
Time ==Iold
Iold xx CPI
CPIoldold xx Clock
Clockcycle
cycle
Time
Timeoldold
20
New
NewExecution
ExecutionTime
Time Inew
Inew xx CPI
CPInew xx Clock
ClockCycle
Cycle
new
Time
Instruction Types & CPI
• Given a program with n types or classes of instructions executed on a given
CPU with the following characteristics:

Ci = Count of instructions of typei executed

CPIi = Cycles per instruction for typei i = 1, 2, …. n
Depends on CPU Design
Then:
CPI = CPU clock cycles / Instruction Count I
i.e average or effective CPI Executed
n
CPU clock cycles 
i 1
CPI C 
i i

Also, Executed Instruction Count I = Ci

T = I x CPI x C

21
Instruction Types & CPI: An Example
• An instruction set has three instruction classes:
Instruction class CPI
A 1 For a specific
B 2 CPU design
C 3

• Two code sequences have the following instruction

counts: Instruction counts for instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1

• CPU cycles for sequence 1 = 2 x 1 + 1 x 2 + 2 x 3 = 10 cycles

effective CPI
CPI for sequence 1 = clock cycles / instruction count
= 10 /5 = 2
• CPU cycles for sequence 2 = 4 x 1 + 1 x 2 + 1 x 3 = 9 cycles
CPI for sequence 2 = 9 / 6 = 1.5
n
CPU clock cycles 
i 1
CPI C 
i i
CPI = CPU Cycles / I 22
Instruction Frequency & CPI
• Given a program with n types or classes of
instructions with the following characteristics:
i = 1, 2, …. n
Ci = Count of instructions of typei executed
CPIi = Average cycles per instruction of typei
Fi = Frequency or fraction of instruction typei
executed Where: Executed Instruction Count I = 

= Ci/ total executed instruction count = Ci/ I

n
Then: CPI  CPI F 
i i
i 1
i.e average or effective CPI

CPIi x Fi
Fraction of total execution time for instructions of type i =
CPI
23
T = I x CPI x C
Instruction Type Frequency & CPI:
A RISC Example
CPIi x Fi
Program Profile or Executed Instructions Mix
CPI
Base Machine (Reg / Reg) Depends on CPU Design

Op Freq, Fi CPIi CPIi x Fi % Time

ALU 50% 1 .5 23% = .5/2.2
Given
Load 20% 5 1.0 45% = 1/2.2
Store 10% 3 .3 14% = .3/2.2
Branch 20% 2 .4 18% = .4/2.2

Typical Mix
Sum = 2.2
n
CPI  CPI i F i 
i.e average or effective CPI i 1

CPI = .5 x 1 + .2 x 5 + .1 x 3 + .2 x 2 = 2.2
= .5 + 1 + .3 + .4
24
T = I x CPI x C
Metrics of Computer
(Measures)
Performance
Application Execution time: Target workload,
SPEC, etc.

Programming
Language

Compiler
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
ISA

Datapath
Control Megabytes per second.
Function Units
Transistors Wires Pins Cycles per second (clock rate).

Each metric has a purpose, and each can be misused.

25
Choosing Programs To Evaluate
Performance
Levels of programs or benchmarks that could be used to evaluate
performance:
– Actual Target Workload: Full applications that run on the
target machine.
– Real Full Program-based Benchmarks:
• Select a specific mix or suite of programs that are typical of
targeted applications or workload (e.g SPEC95, SPEC
CPU2000). Also called synthetic benchmarks
– Small “Kernel” Benchmarks:
• Key computationally-intensive pieces extracted from real programs.
– Examples: Matrix factorization, FFT, tree search, etc.
• Best used to test specific aspects of the machine.

– Microbenchmarks:
• Small, specially written programs to isolate a specific aspect of
performance characteristics: Processing: integer, floating point,
local memory, input/output, etc.
26
Computer Performance Measures :
MIPS (Million Instructions Per Second) Rating
• For a specific program running on a specific CPU the MIPS rating is a measure of
how many millions of instructions are executed per second:
MIPS Rating = Instruction count / (Execution Time x 10 6)
= Instruction count / (CPU clocks x Cycle time x 10 6)
= (Instruction count x Clock rate) / (Instruction count x CPI x 10 6)
= Clock rate / (CPI x 106)
• Major problem with MIPS rating: As shown above the MIPS rating does not account for the
count of instructions executed (I).
– A higher MIPS rating in many cases may not mean higher performance or
better execution time. i.e. due to compiler design variations.
• In addition, the MIPS rating:
– Does not account for the instruction set architecture (ISA) used.
• Thus it cannot be used to compare computers/CPUs with different instruction sets.
– Easy to abuse: Program used to get the MIPS rating is often omitted.
• Often the Peak MIPS rating is provided for a given CPU which is obtained using a
program comprised entirely of instructions with the lowest CPI for the given CPU
design which does not represent real programs.
T = I x CPI x C
27
Computer Performance Measures :
MIPS (Million Instructions Per Second) Rating

• Under what conditions can the MIPS rating be

used to compare performance of different
CPUs?
• The MIPS rating is only valid to compare the performance of different
CPUs provided that the following conditions are satisfied:
1 The same program is used
(actually this applies to all performance metrics)

2 The same ISA is used

3 The same compiler is used

 (Thus the resulting programs used to run on the CPUs and

obtain the MIPS rating are identical at the machine code level
including the same instruction count) (binary)

28
Compiler Variations, MIPS & Performance:
An Example
• For a machine (CPU) with instruction classes:

Instruction class CPI

A 1
B 2
C 3

• For a given high-level language program, two compilers produced

the following executed instruction counts:

Instruction counts (in millions)

for each instruction class
Code from: A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

• The machine is assumed to run at a clock rate of 100 MHz.

29
Compiler Variations, MIPS & Performance:
An Example (Continued)
MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)

CPI = CPU execution cycles / Instructions count

n
CPU clock cycles 
i 1
CPI C 
i i

CPU time = Instruction count x CPI / Clock rate

• For compiler 1:

– CPI1 = (5 x 1 + 1 x 2 + 1 x 3) / (5 + 1 + 1) = 10 / 7 = 1.43
– MIPS Rating1 = 100 / (1.428 x 106) = 70.0 MIPS
– CPU time1 = ((5 + 1 + 1) x 106 x 1.43) / (100 x 106) = 0.10 seconds
• For compiler 2:
– CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25
– MIPS Rating2 = 100 / (1.25 x 106) = 80.0 MIPS
– CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds

MIPS rating indicates that compiler 2 is better while in reality the code produced by
compiler 1 is faster 30
Computer Performance Measures :
MFLOPS (Million FLOating-Point Operations Per Second)
• A floating-point operation is an addition, subtraction, multiplication, or division
operation applied to numbers represented by a single or a double precision
floating-point representation.
• MFLOPS, for a specific program running on a specific computer, is a measure of
millions of floating point-operation (megaflops) per second:

MFLOPS = Number of floating-point operations / (Execution time x 10 6 )

• MFLOPS rating is a better comparison measure between different machines

(applies even if ISAs are different) than the MIPS rating.
– Applicable even if ISAs are different

• Program-dependent: Different programs have different percentages of floating-

point operations present. i.e compilers have no floating- point operations and
yield a MFLOPS rating of zero.

• Dependent on the type of floating-point operations present in the program.

– Peak MFLOPS rating for a CPU: Obtained using a program comprised
entirely of the simplest floating point instructions (with the lowest CPI) for the
31
given CPU design which does not represent real floating point programs.
Quantitative Principles
of Computer Design
Amdahl’s Law:
The performance gain from improving some portion of
a computer is calculated by:
i.e using some enhancement

Speedup = Performance for entire task using the enhancement

Performance for the entire task without using the
enhancement

or Speedup = Execution time without the enhancement

Execution time for entire task using the enhancement
Recall: Performance = 1 /Execution Time
32
Performance Enhancement Calculations:
Amdahl's Law
• The performance enhancement possible due to a given design
improvement is limited by the amount that the improved feature is
used
• Amdahl’s Law:
Performance improvement or speedup due to enhancement E:
Execution Time without E Performance with E
Speedup(E) = -------------------------------------- = ---------------------------------
Execution Time with E Performance without E
original
– Suppose that enhancement E accelerates a fraction F of the
execution time by a factor S and the remainder of the time is
unaffected then:
Execution Time with E = ((1-F) + F/S) X Execution Time without E
Hence speedup is given by:
Execution Time without E
Speedup(E) = ---------------------------------------------------------
((1 - F) + F/S) X Execution Time without E
33
F (Fraction of execution time enhanced) refers to original execution time before the
Pictorial Depiction of Amdahl’s Law
Enhancement E accelerates fraction F of original execution time by a factor of S

Before:
Execution Time without enhancement E: (i.e., before enhancement is applied)
• Shown normalized to 1 = { (1-F) + F }
Unaffected fraction: (1- F) Affected fraction: F

Unchanged

Unaffected fraction: (1- F) F/S

What if the fraction given is after the
enhancement has been applied?
After: How would you solve the problem?
Execution Time with enhancement E: (i.e find expression for speedup)

Execution Time without enhancement E 1

Speedup(E) = ------------------------------------------------------ = ------------------
Execution Time with enhancement E (1 - F) + F/S
34
Performance Enhancement Example
• For the RISC machine with the following instruction mix given earlier:
Op Freq Cycles CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45% CPI = 2.2
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
If a CPU design enhancement improves the CPI of load instructions
from 5 to 2, what is the resulting performance improvement from this
enhancement?
Fraction enhanced = F = 45% or .45
Unaffected fraction = 1- F = 100% - 45% = 55% or .55
Factor of enhancement = S = 5/2 = 2.5
Using Amdahl’s Law:
1 1
Speedup(E) = ------------------ = --------------------- = 1.37
(1 - F) + F/S .55 + .45/2.5

35
An Alternative Solution Using CPU Equation
Op Freq Cycles CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45% CPI = 2.2
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
If a CPU design enhancement improves the CPI of load instructions
from 5 to 2, what is the resulting performance improvement from this
enhancement:
New CPI of load is now 2 instead of 5
Old CPI = 2.2
New CPI = .5 x 1 + .2 x 2 + .1 x 3 + .2 x 2 = 1.6
Original Execution Time Instruction count x old CPI x clock cycle
Speedup(E) = ----------------------------------- = ----------------------------------------------------------------
New Execution Time Instruction count x new CPI x clock cycle

old CPI 2.2

= ------------ = --------- =
1.37
new CPI 1.6
Which is the same speedup obtained from Amdahl’s Law in the first solution.

T = I x CPI x C 36
Performance Enhancement Example
• A program runs in 100 seconds on a machine with multiply operations
responsible for 80 seconds of this time. By how much must the
speed of multiplication be improved to make the program four times
faster?
100
Desired speedup = 4 = -----------------------------------------------------
Execution Time with enhancement
Execution time with enhancement = 100/4 = 25 seconds
25 seconds = (100 - 80 seconds) + 80 seconds / S
25 seconds = 20 seconds + 80 seconds / S
 5 = 80 seconds / S
 S = 80/5 = 16
Alternatively, it can also be solved by finding enhanced fraction of execution time:
F = 80/100 = .8
and then solving Amdahl’s speedup equation for desired enhancement factor S

Hence multiplication should be 16 times

faster to get an overall speedup of 4.
1 1 1
Speedup(E) = ------------------ = 4 = ----------------- = ---------------
(1 - F) + F/S (1 - .8) + .8/S .2 + .8/s
Solving for S gives S= 16 37
Machine = CPU
Performance Enhancement Example
• For the previous example with a program running in 100 seconds on
a machine with multiply operations responsible for 80 seconds of this
time. By how much must the speed of multiplication be improved to
make the program five times faster?
100
Desired speedup = 5 = -----------------------------------------------------
Execution Time with enhancement

Execution time with enhancement = 100/5 = 20 seconds

20 seconds = (100 - 80 seconds) + 80 seconds / s

20 seconds = 20 seconds + 80 seconds / s
 0 = 80 seconds / s

No amount of multiplication speed improvement can achieve this.

38
Extending Amdahl's Law To Multiple Enhancements
n enhancements each affecting a different portion of execution time

• Suppose that enhancement Ei accelerates a fraction Fi of

the original execution time by a factor S i and the remainder
of the time is unaffected then: i = 1, 2, …. n

Original Execution Time

Speedup 
((1   F )   F ) XOriginal Execution Time
i
i i i
S i
Unaffected fraction
1
Speedup  What if the fractions given are

((1   F )   F )
after the enhancements were applied?
i How would you solve the problem?
(i.e find expression for speedup)
i i i
S i

Note: All fractions Fi refer to original execution time before the

enhancements are applied.
39
Amdahl's Law With Multiple Enhancements: Example
• Three CPU performance enhancements are proposed with the following
speedups and percentage of the code execution time affected:
Speedup1 = S1 = 10Percentage1 = F1 = 20%
Speedup2 = S2 = 15 Percentage1 = F2 = 15%
Speedup3 = S3 = 30Percentage1 = F3 = 10%

• While all three enhancements are in place in the new design, each
enhancement affects a different portion of the code and only one
enhancement can be used at a time.
• What is the resulting overall speedup?

1
Speedup 
((1   F )   F ) i
i i i
S i

• Speedup = 1 / [(1 - .2 - .15 - .1) + .2/10 + .15/15 + .1/30)]

= 1/ [ .55 + .0333 ]
= 1 / .5833 = 1.71
40
Pictorial Depiction of Example
i.e normalized to 1
Before:
Execution Time with no enhancements: 1
S1 = 10 S2 = 15 S3 = 30

Unaffected, fraction: .55 F1 = .2 F2 = .15 F3 = .1

/ 10 / 15 / 30

Unchanged

Unaffected, fraction: .55

After:
Execution Time with enhancements: .55 + .02 + .01 + .00333 = .5833
What if the fractions given are
Speedup = 1 / .5833 = 1.71 after the enhancements were applied?
How would you solve the problem?

Note: All fractions Fi refer to original execution time.

41
“Reverse” Multiple Enhancements Amdahl's
Law
• Multiple Enhancements Amdahl's Law assumes that the fractions given
refer to original execution time.

• If for each enhancement Si, the fraction Fi it affects is given as a fraction of

the resulting execution time after the enhancements were applied then:

Speedup 
((1   F )   F S ) XResulting Execution Time
i i i i i

Resulting Execution Time

Unaffected fraction

(1  i F i)  i F i S i
Speedup  (1   F )   F S
i i i
1 i i

i.e as if resulting execution time is normalized to 1 42

“Reverse” Multiple Enhancements Amdahl's
Law

For the previous example assuming fractions given refer to resulting

execution time after the enhancements were applied (not the original
execution time), then:

Speedup = (1 - .2 - .15 - .1) + .2 x10 + .15 x15 + .1x30

= .55 + 2 + 2.25 + 3

= 7.8

43
Probable Conclusions
1. Total Number of instructions is definitely
not a good metric.
2. MIPS is a good metric.

44
Conclusion
Total time of execution is always a better metric as
it sums up all factors and can not be replaced
by considering
1. MIPS
2. Total number of instructions
3. Clock Rate
alone.

45
Measuring Performance
Now that we know that performance is
dependent upon program, which
program(s) should be used to measure
performance?
Benchmarks.

46
Benchmarks
• Are a set of programs that are specifically
chosen for measuring performance.
• Types of Benchmarks
– Real Programs
– Kernel
• Extract the key feature from a program
– Component
– Synthetic
• Dhrystone – floating Point
• Whetstone – Integer and String Arithemetic
– I/O
– Parallel

47
Challenges
1. Vendors may tinker with benchmark to
make them run better on their platform.
At-times this is permitted.
2. Give data set rather than a single
performance number.
3. Concentrate only on computational
power.

48
Popular Benchmarks
• SPEC - Standard Performance Evaluation Corporation
– Floating point
– Integer
– Web
– Graphics
• TPC – Transaction Processing Performance Council
– Web Server
– Transaction Processing
– Decision Support Systems
• BAPCo – Business Applications Performance Corporation
– Popular business applications
• EEMBC – Embedded Microprocessor Benchmark Consortium
– Embedded Applications

49
Statistical Summarization of Data
For Response time metric
Arithmetic Mean

For Throughput metric

Harmonic Mean or Geometric Mean.
SPEC uses Geometric Mean

50
Reference
• Computer Organization and Design The
hardware/software interface by David
A. Patterson, University of California,
Berkeley and John L. Hennessy,
Stanford University

4 Perfrmance
No ratings yet
4 Perfrmance
30 pages
Performance
No ratings yet
Performance
23 pages
Computer Architecture 2
No ratings yet
Computer Architecture 2
17 pages
CA Performance
No ratings yet
CA Performance
26 pages
1.4 - Unit 1 - Performance
No ratings yet
1.4 - Unit 1 - Performance
19 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
University Management System Using Model-View-Controller (MVC) 1
No ratings yet
University Management System Using Model-View-Controller (MVC) 1
5 pages
Performance
No ratings yet
Performance
51 pages
Temenos Cloud Close of Business OPERATIONS MANUAL
No ratings yet
Temenos Cloud Close of Business OPERATIONS MANUAL
8 pages
Bản Sao Của Lecture 2 - Performance Measurement
No ratings yet
Bản Sao Của Lecture 2 - Performance Measurement
9 pages
CS104: Computer Organization: Lecture 08, 2 March 2020
No ratings yet
CS104: Computer Organization: Lecture 08, 2 March 2020
21 pages
Module 3.3 - Problems On Performance
No ratings yet
Module 3.3 - Problems On Performance
54 pages
EC8552 CAO Unit-1 S03
No ratings yet
EC8552 CAO Unit-1 S03
19 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
56 pages
CSS Summative 1 - Q2
No ratings yet
CSS Summative 1 - Q2
3 pages
Cse 317 2
No ratings yet
Cse 317 2
35 pages
CMPS343Chapter1 Part B
No ratings yet
CMPS343Chapter1 Part B
22 pages
Module 2
No ratings yet
Module 2
80 pages
Mechatronics Principles and Applications 1st Edition by Godfrey Onwubolu 0750663790 9780750663793 PDF Download
No ratings yet
Mechatronics Principles and Applications 1st Edition by Godfrey Onwubolu 0750663790 9780750663793 PDF Download
44 pages
Citrix 1Y0 301
No ratings yet
Citrix 1Y0 301
50 pages
Lec10 Performance
No ratings yet
Lec10 Performance
22 pages
ANBASRTB01
No ratings yet
ANBASRTB01
2 pages
Unit 2 Performance
No ratings yet
Unit 2 Performance
6 pages
2 - Computer Organization and Architecture
No ratings yet
2 - Computer Organization and Architecture
21 pages
Comp Org Notes On Measuring Cpu Performance
No ratings yet
Comp Org Notes On Measuring Cpu Performance
4 pages
Ilovepdf - Merged (4) 36 274
No ratings yet
Ilovepdf - Merged (4) 36 274
120 pages
Computer Performance Evaluation Guide
No ratings yet
Computer Performance Evaluation Guide
17 pages
Da Ci
No ratings yet
Da Ci
13 pages
Coa Unit 1 Problems
No ratings yet
Coa Unit 1 Problems
6 pages
A Mini Project Report On Event Management Website: T. E. Computer Engineering-B
No ratings yet
A Mini Project Report On Event Management Website: T. E. Computer Engineering-B
26 pages
Ieee 802
No ratings yet
Ieee 802
29 pages
Monthly Performance Monitoring Form 2
No ratings yet
Monthly Performance Monitoring Form 2
2 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Performance Matrices
No ratings yet
Performance Matrices
14 pages
Chapter 2-Part 12 1
No ratings yet
Chapter 2-Part 12 1
38 pages
Week 2 - Lecture 2 - Performance Measurement
No ratings yet
Week 2 - Lecture 2 - Performance Measurement
25 pages
C A Lecture-3
No ratings yet
C A Lecture-3
41 pages
Computer Performance Insights
No ratings yet
Computer Performance Insights
22 pages
Lesson 3 - Computing For Performance
No ratings yet
Lesson 3 - Computing For Performance
38 pages
Module 2 (26-10-2024)
No ratings yet
Module 2 (26-10-2024)
50 pages
Mix Keynote 2
No ratings yet
Mix Keynote 2
10 pages
Render Cache Optimization Guide
No ratings yet
Render Cache Optimization Guide
40 pages
Frontend Dev E-commerce Task
No ratings yet
Frontend Dev E-commerce Task
3 pages
FortiSwitch 448E Series QSG
No ratings yet
FortiSwitch 448E Series QSG
15 pages
Lecture # 2
No ratings yet
Lecture # 2
33 pages
Puter Performance
No ratings yet
Puter Performance
15 pages
CPU Performance & Power Evaluation
No ratings yet
CPU Performance & Power Evaluation
76 pages
OptiX RTN 980L IP LH V1R8 Hardware Description
No ratings yet
OptiX RTN 980L IP LH V1R8 Hardware Description
63 pages
02 Performance
No ratings yet
02 Performance
13 pages
Digital Signage
No ratings yet
Digital Signage
8 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
17 pages
Network Error and Security Alerts
No ratings yet
Network Error and Security Alerts
3 pages
Performance Measures For Computers
No ratings yet
Performance Measures For Computers
53 pages
Softwareeng
No ratings yet
Softwareeng
10 pages
Performance
No ratings yet
Performance
4 pages
Performance
No ratings yet
Performance
12 pages
Co Unit1 Part3
No ratings yet
Co Unit1 Part3
11 pages
Python Basics for Beginners
No ratings yet
Python Basics for Beginners
6 pages
DHXD - Chuong 8. Performance
No ratings yet
DHXD - Chuong 8. Performance
27 pages
Drive-Based Synchronism: SINAMICS S120 With DCB Extension
No ratings yet
Drive-Based Synchronism: SINAMICS S120 With DCB Extension
11 pages
Data 6502 Integrada
0% (1)
Data 6502 Integrada
3 pages
ATA6563
No ratings yet
ATA6563
30 pages
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
No ratings yet
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
7 pages
Defining Performance
No ratings yet
Defining Performance
6 pages
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
No ratings yet
CS3350B Computer Architecture CPU Performance and Profiling: Marc Moreno Maza
28 pages
COD Ch. 2 The Role of Performance
No ratings yet
COD Ch. 2 The Role of Performance
13 pages
Computer Performance Metrics
No ratings yet
Computer Performance Metrics
19 pages
CLB10503 Principles of Programming Assignment: Movie Ticket Booking Programme (Using C++ Coding)
67% (3)
CLB10503 Principles of Programming Assignment: Movie Ticket Booking Programme (Using C++ Coding)
17 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
CSE 332 L4 - 14 Nov 2020
No ratings yet
CSE 332 L4 - 14 Nov 2020
41 pages
Computer Performance Metrics
No ratings yet
Computer Performance Metrics
40 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
29 pages
Computer Architecture Performance Analysis
No ratings yet
Computer Architecture Performance Analysis
34 pages
Cse - 321 - 2
No ratings yet
Cse - 321 - 2
37 pages
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
No ratings yet
CS322 - Computer Architecture (CA) : Spring 2019 Section V3
52 pages
Chapter 1 Performance
No ratings yet
Chapter 1 Performance
32 pages
550 12 6 2011 PDF
No ratings yet
550 12 6 2011 PDF
45 pages
Samsung LE22S86BD Chassis GJA22SEN
100% (4)
Samsung LE22S86BD Chassis GJA22SEN
123 pages
Cs2100 14 Understanding Performance
No ratings yet
Cs2100 14 Understanding Performance
46 pages
4.1. MOS Capacitor Deep Trench Isolation For CMOS Image Sensors
No ratings yet
4.1. MOS Capacitor Deep Trench Isolation For CMOS Image Sensors
4 pages
A Constant Clock Rate:: - Most Computers Run Synchronously Utilizing A CPU Clock Running at
No ratings yet
A Constant Clock Rate:: - Most Computers Run Synchronously Utilizing A CPU Clock Running at
45 pages
Computer Organization CS1403 System Performance: Mayank Pandey, MNNIT, Allahabad, India
No ratings yet
Computer Organization CS1403 System Performance: Mayank Pandey, MNNIT, Allahabad, India
23 pages
Measuring Performance: Chris Clack B261 Systems Architecture
No ratings yet
Measuring Performance: Chris Clack B261 Systems Architecture
19 pages
Computer Organization The Role of Performance
No ratings yet
Computer Organization The Role of Performance
45 pages
Core Java
100% (2)
Core Java
784 pages
C328 7640
No ratings yet
C328 7640
14 pages
MTL Hart Muxes PDF
No ratings yet
MTL Hart Muxes PDF
30 pages
Parallel Database Systems
No ratings yet
Parallel Database Systems
17 pages

Module-2 Introduction and Performance Analysis

Uploaded by

Module-2 Introduction and Performance Analysis

Uploaded by

Module-2:

Computer Performance and

Does one imply the other?

• Since execution time is the reciprocal of

“the throughput of X is 1.3 times higher than Y”

• Thus the performance ratio is 15/10=1.5

• No. Instructions can be reduced by:

CPU _ TIME Clock _ Cycles _ Needed * Clock _ Cycle _ Time

No. _ Instructions * Clocks _ Per _ instruction

What is this Response Time or Throughput??

• A computer machine instruction is comprised of a number of micro operations

Instruction Obtain instruction from program memory

Operand Locate and obtain operand data

Execute Compute result value or status

Result Deposit results in storage (data memory or

Next Determine successor or next instruction

– How to compare performance of two different machines?

• How can one measure the performance of this machine (CPU)

PerformanceA = 1 / Execution TimeA

(i.e Speedup is ratio of performance, no units)

• CPU execution time is the product of the above three

CPU clock cycles = Instruction count x CPI

This equation is commonly known as the CPU performance equation

T = I x CPI x C Depends on:

Instruction Cycles per Clock Rate

• Using the same program with these changes:

Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )

Ci = Count of instructions of typei executed

Also, Executed Instruction Count I = Ci

• Two code sequences have the following instruction

• CPU cycles for sequence 1 = 2 x 1 + 1 x 2 + 2 x 3 = 10 cycles

= Ci/ total executed instruction count = Ci/ I

Op Freq, Fi CPIi CPIi x Fi % Time

Each metric has a purpose, and each can be misused.

• Under what conditions can the MIPS rating be

2 The same ISA is used

 (Thus the resulting programs used to run on the CPUs and

Instruction class CPI

• For a given high-level language program, two compilers produced

Instruction counts (in millions)

• The machine is assumed to run at a clock rate of 100 MHz.

CPI = CPU execution cycles / Instructions count

CPU time = Instruction count x CPI / Clock rate

MFLOPS = Number of floating-point operations / (Execution time x 10 6 )

• MFLOPS rating is a better comparison measure between different machines

• Program-dependent: Different programs have different percentages of floating-

• Dependent on the type of floating-point operations present in the program.

Speedup = Performance for entire task using the enhancement

or Speedup = Execution time without the enhancement

Unaffected fraction: (1- F) F/S

Execution Time without enhancement E 1

old CPI 2.2

Hence multiplication should be 16 times

Execution time with enhancement = 100/5 = 20 seconds

20 seconds = (100 - 80 seconds) + 80 seconds / s

No amount of multiplication speed improvement can achieve this.

• Suppose that enhancement Ei accelerates a fraction Fi of

Original Execution Time

Note: All fractions Fi refer to original execution time before the

• Speedup = 1 / [(1 - .2 - .15 - .1) + .2/10 + .15/15 + .1/30)]

Unaffected, fraction: .55 F1 = .2 F2 = .15 F3 = .1

Unaffected, fraction: .55

Note: All fractions Fi refer to original execution time.

• If for each enhancement Si, the fraction Fi it affects is given as a fraction of

Resulting Execution Time

i.e as if resulting execution time is normalized to 1 42

For the previous example assuming fractions given refer to resulting

Speedup = (1 - .2 - .15 - .1) + .2 x10 + .15 x15 + .1x30

For Throughput metric

You might also like