Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views51 pages

Module-2 Introduction and Performance Analysis

Uploaded by

Darth Vader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views51 pages

Module-2 Introduction and Performance Analysis

Uploaded by

Darth Vader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 51

Module-2:

Computer Performance and


Measurement
Performance
• What do you mean by performance of
computer?
– Hardware performance
– When a program runs in less time
– Computer is faster when it completes more
number of task per unit time

2
Performance
Two important metrics
• Response Time or Latency or Execution Time – Time
taken for completion of a single job ( time between the
start and the completion of an event). Smaller is better.
• Throughput – Number of jobs done per unit of time.
Larger is better.

Does one imply the other?


• Yes. Eg. If latency decreases, throughput will increase.
• No. Eg. In pipelining, latency may have be increased to
increase throughput!

3
Comparing Design Alternatives
• “X is n times faster than Y”

• Since execution time is the reciprocal of


performance, the following relationship
holds:

“the throughput of X is 1.3 times higher than Y”


4
Relative Performance
• If computer A runs a program in 10 sec
and Computer B runs the same program in
15 sec, how much faster is A than B ?
• We know that A is n times faster than B if

• Thus the performance ratio is 15/10=1.5


• A is therefore 1.5 times faster than B.

5
Measuring Performance
• Time is not always the metric quoted in
comparing the performance of computers.
• Reliable measure of performance is the
execution time of real programs.
• What is Time?
– wall-clock time
– response time or elapsed time
– disk accesses, memory accesses, input/output activities
– operating system overhead
– CPU time(user CPU time or System CPU time) or I/O
time
6
Measuring Performance
• Computer designers measure that how
fast the hardware can perform basic
functions.
• Computers are constructed using a clock
that runs at a constant rate and
determines when events take place in the
hardware.
• clock cycles, clock period, clock
rate(inverse of clock period)

7
How can we Improve Performance ?

• No. Instructions can be reduced by:


– Better Instruction set architecture (ISA)
– Better Compiler
– Better Algorithm
• Clocks Per Instruction can be reduced by:
– Better Hardware Design
– Make the common case faster
• Clock Rate can be increased by:
– Hardware Design
8
CPU Performance Equation

CPU _ TIME Clock _ Cycles _ Needed * Clock _ Cycle _ Time

No. _ Instructions * Clocks _ Per _ instruction


CPU _ TIME 
Clock _ Rate

What is this Response Time or Throughput??

9
CPU Performance Evaluation: CPI
• Most computers run synchronously utilizing a CPU clock running at a
constant clock rate (Or clock frequency f ) Clock cycle
where: Clock rate = 1 / clock cycle time
cycle 1 cycle 2 cycle 3
f = 1 /C

• The CPU clock rate depends on the specific CPU organization (design) and
hardware implementation technology (VLSI) used.

• A computer machine instruction is comprised of a number of micro operations


which vary in number and complexity depending on the instruction type and the
exact CPU organization (Design).
– A micro operation is an elementary hardware operation that can be
performed during one CPU clock cycle.
– This corresponds to one micro-instruction in micro-programmed CPUs.
Examples: register operations: shift, load, clear, increment, ALU operations:
add , subtract, etc.
• Thus: A single machine instruction may take one or more CPU cycles to
complete and termed as the Cycles Per Instruction (CPI).
• Average (or effective) CPI of a program: The average CPI of all instructions
executed in the program on a given CPU design.
Cycles/sec = Hertz = Hz10
Instructions Per Cycle = IPC = 1/CPI MHz = 106 Hz GHz = 109 Hz
Generic CPU Instruction Processing Steps

Instruction Obtain instruction from program memory


Fetch The Program Counter (PC) points to the instruction to be processed

Instruction
Determine required actions and instruction size
Decode

Operand Locate and obtain operand data


Fetch From data memory or registers

Execute Compute result value or status

Result Deposit results in storage (data memory or


Store register) for later use

Next Determine successor or next instruction


Instruction (i.e Update PC to fetch next instruction to be processed)

11
Computer Performance Measures: Program
Execution Time
• For a specific program compiled to run on a specific machine (CPU)
“A”, has the following parameters:
– I: The total executed instruction count of the program.
– CPI: The average number of cycles per instruction (average CPI).
– C: Clock cycle of machine “A” Or effective CPI

12
So…The Questions are..

– How to compare performance of two different machines?


– What factors affect performance?
– How to improve performance?

• How can one measure the performance of this machine (CPU)


running this program?
– The machine (or CPU) is said to be faster or has better performance
running this program if the total execution time is shorter.

PerformanceA = 1 / Execution TimeA


Programs/second Seconds/program

13
Comparing Computer Performance Using Execution Time
• To compare the performance of two machines (or CPUs) “A”, “B” running
the same program:
PerformanceA = 1 / Execution TimeA
PerformanceB = 1 / Execution TimeB
• Machine A is n times faster than machine B means (or slower if n < 1) :
PerformanceA Execution TimeB
Speedup = n = Performance =
B Execution TimeA

(i.e Speedup is ratio of performance, no units)

• Example:
For a given program:
Execution time on machine A: ExecutionA = 1 second
Execution time on machine B: ExecutionB = 10 seconds
Speedup=
PerformanceA / PerformanceB = Execution TimeB / Execution TimeA
= 10 / 1 = 10

14
CPU Execution Time: The CPU Equation
• A program is comprised of a number of instructions executed I
– Measured in: instructions/program
• The average instruction executed takes a number of cycles per
instruction (CPI) to be completed.
– Measured in: cycles/instruction, CPI
• CPU has a fixed clock cycle time C = 1/clock rate
– Measured in: seconds/cycle

• CPU execution time is the product of the above three


parameters as follows:
CPU
CPUtime
time == Seconds
Seconds ==Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle

T = I x CPI x C
execution Time Number of Average CPI for program CPU Clock Cycle
per program in seconds instructions executed
15
This equation is commonly known as the CPU performance equation
CPU Performance Equation
For a given program executed on a given machine (CPU):
CPI = Total program execution cycles / Instructions count
(i.e average or effective CPI)

CPU clock cycles = Instruction count x CPI


CPU execution time = ?
= CPU clock cycles x Clock cycle Time
= Instruction count x CPI x Clock cycle

This equation is commonly known as the CPU performance equation


16
CPU Execution Time: Example
• A Program is running on a specific machine (CPU) with
the following parameters:
– Total executed instruction count: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
What is the execution time for this program??

CPU
CPUtime
time == Seconds
Seconds ==Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle
CPU time = Instruction count x CPI x Clock cycle Time
= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10 -9
= 0.125 seconds

17
T = I x CPI x C
Aspects of CPU Execution Time
CPU Time = Instruction count executed x CPI x Clock cycle

T = I x CPI x C Depends on:


Program Used
Compiler
ISA

IC
(executed)

Depends on:
Program Used Depends on:
Compiler CPI CCT CPU Organization
ISA Technology (VLSI)
(Average
CPU Organization
CPI)

18
Factors Affecting CPU
Performance
CPU
CPUtime
time == Seconds
Seconds ==Instructions
Instructions xx Cycles
Cycles xx Seconds
Seconds
Program
Program Program
Program Instruction
Instruction Cycle
Cycle

Instruction Cycles per Clock Rate


Count Instruction (1/CCT)
Program

Compiler

Instruction Set
Architecture (ISA)

Organization
(CPU Design)

Technology
(VLSI)

19
T = I x CPI x C
Performance Comparison: Example
• From the previous example: A Program is running on a specific machine
(CPU) with the following parameters:
– Total executed instruction count, IC: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz. Thus: CCT = 1/(200x10 )= 5x10 seconds
6 -9

• Using the same program with these changes:


– A new compiler used: New executed instruction count, I C :
9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHz
• What is the speedup with the changes?

Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )


= .125 / .095 = 1.32
or 32 % faster after changes.

Speedup
Speedup == Old
OldExecution
ExecutionTime
Time ==Iold
Iold xx CPI
CPIoldold xx Clock
Clockcycle
cycle
Time
Timeoldold
20
New
NewExecution
ExecutionTime
Time Inew
Inew xx CPI
CPInew xx Clock
ClockCycle
Cycle
new
Time
Instruction Types & CPI
• Given a program with n types or classes of instructions executed on a given
CPU with the following characteristics:

Ci = Count of instructions of typei executed


CPIi = Cycles per instruction for typei i = 1, 2, …. n
Depends on CPU Design
Then:
CPI = CPU clock cycles / Instruction Count I
i.e average or effective CPI Executed
n
CPU clock cycles 
i 1
CPI C 
i i

Also, Executed Instruction Count I = Ci


T = I x CPI x C

21
Instruction Types & CPI: An Example
• An instruction set has three instruction classes:
Instruction class CPI
A 1 For a specific
B 2 CPU design
C 3

• Two code sequences have the following instruction


counts: Instruction counts for instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1

• CPU cycles for sequence 1 = 2 x 1 + 1 x 2 + 2 x 3 = 10 cycles


effective CPI
CPI for sequence 1 = clock cycles / instruction count
= 10 /5 = 2
• CPU cycles for sequence 2 = 4 x 1 + 1 x 2 + 1 x 3 = 9 cycles
CPI for sequence 2 = 9 / 6 = 1.5
n
CPU clock cycles 
i 1
CPI C 
i i
CPI = CPU Cycles / I 22
Instruction Frequency & CPI
• Given a program with n types or classes of
instructions with the following characteristics:
i = 1, 2, …. n
Ci = Count of instructions of typei executed
CPIi = Average cycles per instruction of typei
Fi = Frequency or fraction of instruction typei
executed Where: Executed Instruction Count I = 

= Ci/ total executed instruction count = Ci/ I


n
Then: CPI  CPI F 
i i
i 1
i.e average or effective CPI

CPIi x Fi
Fraction of total execution time for instructions of type i =
CPI
23
T = I x CPI x C
Instruction Type Frequency & CPI:
A RISC Example
CPIi x Fi
Program Profile or Executed Instructions Mix
CPI
Base Machine (Reg / Reg) Depends on CPU Design

Op Freq, Fi CPIi CPIi x Fi % Time


ALU 50% 1 .5 23% = .5/2.2
Given
Load 20% 5 1.0 45% = 1/2.2
Store 10% 3 .3 14% = .3/2.2
Branch 20% 2 .4 18% = .4/2.2

Typical Mix
Sum = 2.2
n
CPI  CPI i F i 
i.e average or effective CPI i 1

CPI = .5 x 1 + .2 x 5 + .1 x 3 + .2 x 2 = 2.2
= .5 + 1 + .3 + .4
24
T = I x CPI x C
Metrics of Computer
(Measures)
Performance
Application Execution time: Target workload,
SPEC, etc.

Programming
Language

Compiler
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
ISA

Datapath
Control Megabytes per second.
Function Units
Transistors Wires Pins Cycles per second (clock rate).

Each metric has a purpose, and each can be misused.

25
Choosing Programs To Evaluate
Performance
Levels of programs or benchmarks that could be used to evaluate
performance:
– Actual Target Workload: Full applications that run on the
target machine.
– Real Full Program-based Benchmarks:
• Select a specific mix or suite of programs that are typical of
targeted applications or workload (e.g SPEC95, SPEC
CPU2000). Also called synthetic benchmarks
– Small “Kernel” Benchmarks:
• Key computationally-intensive pieces extracted from real programs.
– Examples: Matrix factorization, FFT, tree search, etc.
• Best used to test specific aspects of the machine.

– Microbenchmarks:
• Small, specially written programs to isolate a specific aspect of
performance characteristics: Processing: integer, floating point,
local memory, input/output, etc.
26
Computer Performance Measures :
MIPS (Million Instructions Per Second) Rating
• For a specific program running on a specific CPU the MIPS rating is a measure of
how many millions of instructions are executed per second:
MIPS Rating = Instruction count / (Execution Time x 10 6)
= Instruction count / (CPU clocks x Cycle time x 10 6)
= (Instruction count x Clock rate) / (Instruction count x CPI x 10 6)
= Clock rate / (CPI x 106)
• Major problem with MIPS rating: As shown above the MIPS rating does not account for the
count of instructions executed (I).
– A higher MIPS rating in many cases may not mean higher performance or
better execution time. i.e. due to compiler design variations.
• In addition, the MIPS rating:
– Does not account for the instruction set architecture (ISA) used.
• Thus it cannot be used to compare computers/CPUs with different instruction sets.
– Easy to abuse: Program used to get the MIPS rating is often omitted.
• Often the Peak MIPS rating is provided for a given CPU which is obtained using a
program comprised entirely of instructions with the lowest CPI for the given CPU
design which does not represent real programs.
T = I x CPI x C
27
Computer Performance Measures :
MIPS (Million Instructions Per Second) Rating

• Under what conditions can the MIPS rating be


used to compare performance of different
CPUs?
• The MIPS rating is only valid to compare the performance of different
CPUs provided that the following conditions are satisfied:
1 The same program is used
(actually this applies to all performance metrics)

2 The same ISA is used


3 The same compiler is used

 (Thus the resulting programs used to run on the CPUs and


obtain the MIPS rating are identical at the machine code level
including the same instruction count) (binary)

28
Compiler Variations, MIPS & Performance:
An Example
• For a machine (CPU) with instruction classes:

Instruction class CPI


A 1
B 2
C 3

• For a given high-level language program, two compilers produced


the following executed instruction counts:

Instruction counts (in millions)


for each instruction class
Code from: A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

• The machine is assumed to run at a clock rate of 100 MHz.

29
Compiler Variations, MIPS & Performance:
An Example (Continued)
MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)

CPI = CPU execution cycles / Instructions count


n
CPU clock cycles 
i 1
CPI C 
i i

CPU time = Instruction count x CPI / Clock rate


• For compiler 1:

– CPI1 = (5 x 1 + 1 x 2 + 1 x 3) / (5 + 1 + 1) = 10 / 7 = 1.43
– MIPS Rating1 = 100 / (1.428 x 106) = 70.0 MIPS
– CPU time1 = ((5 + 1 + 1) x 106 x 1.43) / (100 x 106) = 0.10 seconds
• For compiler 2:
– CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25
– MIPS Rating2 = 100 / (1.25 x 106) = 80.0 MIPS
– CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds

MIPS rating indicates that compiler 2 is better while in reality the code produced by
compiler 1 is faster 30
Computer Performance Measures :
MFLOPS (Million FLOating-Point Operations Per Second)
• A floating-point operation is an addition, subtraction, multiplication, or division
operation applied to numbers represented by a single or a double precision
floating-point representation.
• MFLOPS, for a specific program running on a specific computer, is a measure of
millions of floating point-operation (megaflops) per second:

MFLOPS = Number of floating-point operations / (Execution time x 10 6 )

• MFLOPS rating is a better comparison measure between different machines


(applies even if ISAs are different) than the MIPS rating.
– Applicable even if ISAs are different

• Program-dependent: Different programs have different percentages of floating-


point operations present. i.e compilers have no floating- point operations and
yield a MFLOPS rating of zero.

• Dependent on the type of floating-point operations present in the program.


– Peak MFLOPS rating for a CPU: Obtained using a program comprised
entirely of the simplest floating point instructions (with the lowest CPI) for the
31
given CPU design which does not represent real floating point programs.
Quantitative Principles
of Computer Design
Amdahl’s Law:
The performance gain from improving some portion of
a computer is calculated by:
i.e using some enhancement

Speedup = Performance for entire task using the enhancement


Performance for the entire task without using the
enhancement

or Speedup = Execution time without the enhancement


Execution time for entire task using the enhancement
Recall: Performance = 1 /Execution Time
32
Performance Enhancement Calculations:
Amdahl's Law
• The performance enhancement possible due to a given design
improvement is limited by the amount that the improved feature is
used
• Amdahl’s Law:
Performance improvement or speedup due to enhancement E:
Execution Time without E Performance with E
Speedup(E) = -------------------------------------- = ---------------------------------
Execution Time with E Performance without E
original
– Suppose that enhancement E accelerates a fraction F of the
execution time by a factor S and the remainder of the time is
unaffected then:
Execution Time with E = ((1-F) + F/S) X Execution Time without E
Hence speedup is given by:
Execution Time without E
Speedup(E) = ---------------------------------------------------------
((1 - F) + F/S) X Execution Time without E
33
F (Fraction of execution time enhanced) refers to original execution time before the
Pictorial Depiction of Amdahl’s Law
Enhancement E accelerates fraction F of original execution time by a factor of S

Before:
Execution Time without enhancement E: (i.e., before enhancement is applied)
• Shown normalized to 1 = { (1-F) + F }
Unaffected fraction: (1- F) Affected fraction: F

Unchanged

Unaffected fraction: (1- F) F/S


What if the fraction given is after the
enhancement has been applied?
After: How would you solve the problem?
Execution Time with enhancement E: (i.e find expression for speedup)

Execution Time without enhancement E 1


Speedup(E) = ------------------------------------------------------ = ------------------
Execution Time with enhancement E (1 - F) + F/S
34
Performance Enhancement Example
• For the RISC machine with the following instruction mix given earlier:
Op Freq Cycles CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45% CPI = 2.2
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
If a CPU design enhancement improves the CPI of load instructions
from 5 to 2, what is the resulting performance improvement from this
enhancement?
Fraction enhanced = F = 45% or .45
Unaffected fraction = 1- F = 100% - 45% = 55% or .55
Factor of enhancement = S = 5/2 = 2.5
Using Amdahl’s Law:
1 1
Speedup(E) = ------------------ = --------------------- = 1.37
(1 - F) + F/S .55 + .45/2.5

35
An Alternative Solution Using CPU Equation
Op Freq Cycles CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45% CPI = 2.2
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
If a CPU design enhancement improves the CPI of load instructions
from 5 to 2, what is the resulting performance improvement from this
enhancement:
New CPI of load is now 2 instead of 5
Old CPI = 2.2
New CPI = .5 x 1 + .2 x 2 + .1 x 3 + .2 x 2 = 1.6
Original Execution Time Instruction count x old CPI x clock cycle
Speedup(E) = ----------------------------------- = ----------------------------------------------------------------
New Execution Time Instruction count x new CPI x clock cycle

old CPI 2.2


= ------------ = --------- =
1.37
new CPI 1.6
Which is the same speedup obtained from Amdahl’s Law in the first solution.

T = I x CPI x C 36
Performance Enhancement Example
• A program runs in 100 seconds on a machine with multiply operations
responsible for 80 seconds of this time. By how much must the
speed of multiplication be improved to make the program four times
faster?
100
Desired speedup = 4 = -----------------------------------------------------
Execution Time with enhancement
Execution time with enhancement = 100/4 = 25 seconds
25 seconds = (100 - 80 seconds) + 80 seconds / S
25 seconds = 20 seconds + 80 seconds / S
 5 = 80 seconds / S
 S = 80/5 = 16
Alternatively, it can also be solved by finding enhanced fraction of execution time:
F = 80/100 = .8
and then solving Amdahl’s speedup equation for desired enhancement factor S

Hence multiplication should be 16 times


faster to get an overall speedup of 4.
1 1 1
Speedup(E) = ------------------ = 4 = ----------------- = ---------------
(1 - F) + F/S (1 - .8) + .8/S .2 + .8/s
Solving for S gives S= 16 37
Machine = CPU
Performance Enhancement Example
• For the previous example with a program running in 100 seconds on
a machine with multiply operations responsible for 80 seconds of this
time. By how much must the speed of multiplication be improved to
make the program five times faster?
100
Desired speedup = 5 = -----------------------------------------------------
Execution Time with enhancement

Execution time with enhancement = 100/5 = 20 seconds

20 seconds = (100 - 80 seconds) + 80 seconds / s


20 seconds = 20 seconds + 80 seconds / s
 0 = 80 seconds / s

No amount of multiplication speed improvement can achieve this.

38
Extending Amdahl's Law To Multiple Enhancements
n enhancements each affecting a different portion of execution time

• Suppose that enhancement Ei accelerates a fraction Fi of


the original execution time by a factor S i and the remainder
of the time is unaffected then: i = 1, 2, …. n

Original Execution Time


Speedup 
((1   F )   F ) XOriginal Execution Time
i
i i i
S i
Unaffected fraction
1
Speedup  What if the fractions given are

((1   F )   F )
after the enhancements were applied?
i How would you solve the problem?
(i.e find expression for speedup)
i i i
S i

Note: All fractions Fi refer to original execution time before the


enhancements are applied.
39
Amdahl's Law With Multiple Enhancements: Example
• Three CPU performance enhancements are proposed with the following
speedups and percentage of the code execution time affected:
Speedup1 = S1 = 10Percentage1 = F1 = 20%
Speedup2 = S2 = 15 Percentage1 = F2 = 15%
Speedup3 = S3 = 30Percentage1 = F3 = 10%

• While all three enhancements are in place in the new design, each
enhancement affects a different portion of the code and only one
enhancement can be used at a time.
• What is the resulting overall speedup?

1
Speedup 
((1   F )   F ) i
i i i
S i

• Speedup = 1 / [(1 - .2 - .15 - .1) + .2/10 + .15/15 + .1/30)]


= 1/ [ .55 + .0333 ]
= 1 / .5833 = 1.71
40
Pictorial Depiction of Example
i.e normalized to 1
Before:
Execution Time with no enhancements: 1
S1 = 10 S2 = 15 S3 = 30

Unaffected, fraction: .55 F1 = .2 F2 = .15 F3 = .1

/ 10 / 15 / 30

Unchanged

Unaffected, fraction: .55

After:
Execution Time with enhancements: .55 + .02 + .01 + .00333 = .5833
What if the fractions given are
Speedup = 1 / .5833 = 1.71 after the enhancements were applied?
How would you solve the problem?

Note: All fractions Fi refer to original execution time.

41
“Reverse” Multiple Enhancements Amdahl's
Law
• Multiple Enhancements Amdahl's Law assumes that the fractions given
refer to original execution time.

• If for each enhancement Si, the fraction Fi it affects is given as a fraction of


the resulting execution time after the enhancements were applied then:

Speedup 
((1   F )   F S ) XResulting Execution Time
i i i i i

Resulting Execution Time

Unaffected fraction

(1  i F i)  i F i S i
Speedup  (1   F )   F S
i i i
1 i i

i.e as if resulting execution time is normalized to 1 42


“Reverse” Multiple Enhancements Amdahl's
Law

For the previous example assuming fractions given refer to resulting


execution time after the enhancements were applied (not the original
execution time), then:

Speedup = (1 - .2 - .15 - .1) + .2 x10 + .15 x15 + .1x30

= .55 + 2 + 2.25 + 3

= 7.8

43
Probable Conclusions
1. Total Number of instructions is definitely
not a good metric.
2. MIPS is a good metric.

44
Conclusion
Total time of execution is always a better metric as
it sums up all factors and can not be replaced
by considering
1. MIPS
2. Total number of instructions
3. Clock Rate
alone.

45
Measuring Performance
Now that we know that performance is
dependent upon program, which
program(s) should be used to measure
performance?
Benchmarks.

46
Benchmarks
• Are a set of programs that are specifically
chosen for measuring performance.
• Types of Benchmarks
– Real Programs
– Kernel
• Extract the key feature from a program
– Component
– Synthetic
• Dhrystone – floating Point
• Whetstone – Integer and String Arithemetic
– I/O
– Parallel

47
Challenges
1. Vendors may tinker with benchmark to
make them run better on their platform.
At-times this is permitted.
2. Give data set rather than a single
performance number.
3. Concentrate only on computational
power.

48
Popular Benchmarks
• SPEC - Standard Performance Evaluation Corporation
– Floating point
– Integer
– Web
– Graphics
• TPC – Transaction Processing Performance Council
– Web Server
– Transaction Processing
– Decision Support Systems
• BAPCo – Business Applications Performance Corporation
– Popular business applications
• EEMBC – Embedded Microprocessor Benchmark Consortium
– Embedded Applications

49
Statistical Summarization of Data
For Response time metric
Arithmetic Mean

For Throughput metric


Harmonic Mean or Geometric Mean.
SPEC uses Geometric Mean

50
Reference
• Computer Organization and Design The
hardware/software interface by David
A. Patterson, University of California,
Berkeley and John L. Hennessy,
Stanford University

51

You might also like