0% found this document useful (0 votes)

42 views21 pages

Cray Debugging and Profiling Tools Guide

The document provides an introduction and overview of various tools for profiling and debugging applications on Cray systems, including Perftools-lite for automatic profiling, Apprentice2 for sampling and event tracing analysis, and Reveal and LGDB for visualization and debugging. It describes the basic functionalities and usage of these tools to help users identify performance bottlenecks and optimize applications. The document also outlines a typical 3 step workflow of instrumentation, data collection, and analysis using the CrayPat toolset.

Uploaded by

luis900000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views21 pages

Cray Debugging and Profiling Tools Guide

Uploaded by

luis900000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Short Introduction to

Debugging/Profiling
Tools
George Markomanolis
20 May 2015
Outline
!  Profiling – Cray tools
!  Perftools-lite
!  Apprentice 2
!  Reveal

!  Debugging
!  Lgdb
Performance Analysis

!  Why performance analysis?

!  Investigate the bottlenecks of an application
!  Identify potential improvements
!  Better usage of the hardware

!  Profiling
!  Sampling
!  Lightweight
!  Overhead depends on the sampling frequency
!  Can lack resolution if there are small function calls
!  Event Tracing
!  Detailed information
!  Captures every event
!  Can capture communication events
!  Drawbacks, overhead and large amounts of data
CrayPat overview

!  Assist the user with application performance

analysis and optimization
!  Provides concrete suggestions instead of just reporting

!  Basic functionalities apply for all the compilers on

the system

!  Requires no source code or Makefile modification

(for most of the cases)
3 steps of CrayPAT

!  Instrumentation
!  Use pat_build to apply instrumentation to program binaries

!  Data collection
!  Via execution

!  Analysis: Sampling/Tracing
!  Use tools pat_report, Cray Apprentice2, Reveal
!  Automatic Performance Analysis (APA) combines the two
approaches
!  Loop profiling is a special flavor of event tracing
CrayPat – lite I

!  Provide automatic application performance statistics at

the end of a job
!  Usage for NPB/LU
!  vim config/make.def
!  MPIF77 = ftn
!  module load perftools-lite
!  make clean
!  Compile LU benchmark, class C for 64 MPI processes
!  make LU NPROCS=64 CLASS=C
!  sbatch execute_lu.sh
!  Two files with extension rpt and ap2 are created
CrayPat – lite II

Table 1: Profile by Function Group and Function (top 10 functions shown)

Samp% | Samp | Imb. | Imb. |Group

100.0% | 1715.7 | -- | -- |Total

|--------------------------------------------
| 85.8% | 1472.3 | -- | -- |USER
|| 39.0% | 668.7 | 56.3 | 7.9% |rhs_
|| 10.0% | 171.7 | 25.3 | 13.0% |buts_
|| 9.6% | 165.0 | 22.0 | 12.0% |jacld_
|| 9.6% | 163.9 | 21.1 | 11.6% |blts_
|| 9.4% | 161.9 | 23.1 | 12.7% |jacu_
|| 3.7% | 63.7 | 27.3 | 30.5% |ssor_
|| 3.2% | 54.5 | 31.5 | 37.2% |exchange_3_
||===========================================
| 14.2% | 243.1 | -- | -- |MPI
||-------------------------------------------
|| 6.9% | 119.1 | 118.9 | 50.8% |MPI_RECV
|| 4.1% | 69.8 | 64.2 | 48.7% |mpi_bcast
|| 1.5% | 26.5 | 102.5 | 80.7% |mpi_wait
|============================================
CrayPat – lite III

Table 2: Profile by Group, Function, and Line

File: rhs.f, lines 39 - 47
Samp% | Samp | Imb. | Imb. |Group
| | Samp | Samp% | Function
| | | | Source
| | | | Line do k = 1, nz
| | | | PE=HIDE
do j = 1, ny
100.0% | 1715.7 | -- | -- |Total do i = 1, nx
|--------------------------------------------------------------------
| 85.8% | 1472.3 | -- | -- |USER do m = 1, 5
||-------------------------------------------------------------------
|| 39.0% | 668.7 | -- | -- |rhs_ rsd(m,i,j,k) = - frct(m,i,j,k)
3| | | | | NPB3.3.1/NPB3.3-MPI/LU/rhs.f
||||----------------------------------------------------------------- end do
4||| 2.7% | 45.5 | 19.5 | 30.4% |line.43
4||| 2.2% | 37.0 | 16.0 | 30.6% |line.96 end do
4||| 1.7% | 28.5 | 15.5 | 35.7% |line.228
4||| 1.8% | 31.6 | 11.4 | 26.9% |line.246 end do
4||| 3.8% | 65.0 | 17.0 | 21.1% |line.336 end do
CrayPAT – lite with sample profiling,
big case
!  Big case, regarding NAS Parallel Benchmarks

LU, class E, 2048 MPI processes

Overhead ! 0.58%!

!  Better MPI mapping topology detected

MPI Grid Detection:

There appears to be point-to-point MPI communication in a 32 X 64

grid pattern. The 14.7% of the total execution time spent in MPI
functions might be reduced with a rank order that maximizes
communication between ranks on the same node. The effect of several
rank orders is estimated below.

A file named MPICH_RANK_ORDER.Grid was generated along with this

CrayPAT – lite with sample profiling,
big case

Rank Order On-Node On-Node MPICH_RANK_REORDER_METHOD

Bytes/PE Bytes/PE%
of Total
Bytes/PE

Custom 5.981e+12 84.20% 3

SMP 4.614e+12 64.96% 1
RoundRobin 2.342e+12 32.98% 0
Fold 7.209e+10 1.01% 2

!  A file entitled MPICH_RANK_ORDER.Grid has been

created
!  Execution improved by 2.1%
Apprentice2 - I
Apprentice2 - II
Apprentice2 - III
Reveal tool

!  Compile your code with Cray compiler for using the

results with Reveal tool
!  MPIF77 = ftn -h profile_generate -h pl=npb_lu.pl -h noomp
-h noacc
!  module load perftools
!  make LU NPROCS=64 CLASS=C
!  pat_build –w lu.C.64
!  New file is called lu.C.64+pat
!  Execute lu.C.64+pat executable
!  pat_report –o lu_c_64.txt lu.C.64+XXX.xf
!  New file called lu_c_64.ap2 is created
!  reveal /path/npb_lu.pl /path/lu_c_64.ap2
Reveal tool I

KAUST King Abdullah University of Science and Technology

15
Reveal tool II

KAUST King Abdullah University of Science and Technology

16
Reveal tool III

KAUST King Abdullah University of Science and Technology

17
Reveal tool IV
Debugging – LGDB
!  LGDB is a line mode parallel debugger for Cray systems
!  Usage: module load cray-lgdb
!  Binaries should be compiled with -g or -Gfast

!  Many features from GDB but includes extensions for handling parallel
processes

ftn -g -o exec exec.f

salloc
module load cray-lgdb
lgdb
launch $pset{8} ./exec
break exec.f:3
continue
print $pset::myRank
pset[0]: 0
…
pset[7]: 7

!  Other tools are available such as Totalview, DDT

Conclusions

!  There are many tools that could help you understand the
insights of your application

!  Perftools-lite is straight forward for a new user

!  Potential to port code from a serial or MPI version to

OpenMPI and hybrid respectively

!  Get advantage of the tools

Thank you!
[email protected]

Profiling JVM Applications in Production
No ratings yet
Profiling JVM Applications in Production
74 pages
ZTE SDR BTS Introduction - V2.00 - 20130403
No ratings yet
ZTE SDR BTS Introduction - V2.00 - 20130403
63 pages
Covidien Nellcore PM10N Service Manual
No ratings yet
Covidien Nellcore PM10N Service Manual
117 pages
CBSE Class 1 Computer Science Worksheet - Dos and Don'ts With The Computer
No ratings yet
CBSE Class 1 Computer Science Worksheet - Dos and Don'ts With The Computer
6 pages
Cigre Trafo
No ratings yet
Cigre Trafo
11 pages
Minilink Cli
75% (4)
Minilink Cli
2 pages
8 Nvidia PDF
No ratings yet
8 Nvidia PDF
48 pages
Ansys Tutorial
No ratings yet
Ansys Tutorial
17 pages
Module 5
No ratings yet
Module 5
71 pages
Pyrhonen Squirrel Cage Motor Calculation Mathcad13 PDF
No ratings yet
Pyrhonen Squirrel Cage Motor Calculation Mathcad13 PDF
35 pages
Flux & FluxMotor New Features 12.2
No ratings yet
Flux & FluxMotor New Features 12.2
88 pages
Introduction of Microprocessor: Presented By: Engr. Jayson P. Doloriel
No ratings yet
Introduction of Microprocessor: Presented By: Engr. Jayson P. Doloriel
25 pages
Appendix C: ANSYS TEG Tutorial
No ratings yet
Appendix C: ANSYS TEG Tutorial
11 pages
Prius Motor - Maxwell 2D Transient To FLUENT 3D Steady State Maxwell 2D Transient To FLUENT 3D Steady State Coupling
100% (1)
Prius Motor - Maxwell 2D Transient To FLUENT 3D Steady State Maxwell 2D Transient To FLUENT 3D Steady State Coupling
11 pages
Migdalskiy Sergiy Physics Optimization Strategies
No ratings yet
Migdalskiy Sergiy Physics Optimization Strategies
104 pages
Probability Statistics Report
No ratings yet
Probability Statistics Report
34 pages
Optimising Serial Code
No ratings yet
Optimising Serial Code
101 pages
Eco-Friendly Semi-Automatic Road Cleaner
No ratings yet
Eco-Friendly Semi-Automatic Road Cleaner
7 pages
Lab Manual
No ratings yet
Lab Manual
80 pages
Predictive Modelling: Linear Regression Analysis
No ratings yet
Predictive Modelling: Linear Regression Analysis
94 pages
Transformer Technology Challenges
No ratings yet
Transformer Technology Challenges
7 pages
Lecture 5 Design Representation
No ratings yet
Lecture 5 Design Representation
22 pages
Electrical Steels in Transformer Core
No ratings yet
Electrical Steels in Transformer Core
5 pages
DDR Benchmarking with LMBench Tools
100% (1)
DDR Benchmarking with LMBench Tools
29 pages
Transient Behaviour of Grounding Grids
No ratings yet
Transient Behaviour of Grounding Grids
115 pages
Simcenter 3D LFEM - Tutorial Power Transformer
No ratings yet
Simcenter 3D LFEM - Tutorial Power Transformer
40 pages
The CWT Service Menu (Issue 1)
No ratings yet
The CWT Service Menu (Issue 1)
31 pages
Digital Evidence Acquisition Guide
No ratings yet
Digital Evidence Acquisition Guide
93 pages
Aiml Lab
No ratings yet
Aiml Lab
45 pages
Overcoming The Challenges of Hybrid/Electric Vehicle Traction Motor Design
No ratings yet
Overcoming The Challenges of Hybrid/Electric Vehicle Traction Motor Design
12 pages
High Speed electronics-UoH - 4-Vivado-Presentation
No ratings yet
High Speed electronics-UoH - 4-Vivado-Presentation
66 pages
Maxwell ANSYS Tutorial For Simulating Conductors For Inductance/Resistance Measurements Includes Analysis of Mesh Setup For ANSYS Adaptive Solutions
No ratings yet
Maxwell ANSYS Tutorial For Simulating Conductors For Inductance/Resistance Measurements Includes Analysis of Mesh Setup For ANSYS Adaptive Solutions
14 pages
Siemens Sinamics V20 Getting Started
No ratings yet
Siemens Sinamics V20 Getting Started
34 pages
Factors Influency The Design of Large High-Voltage Power Transformers 2ProceedingsofJan1968
No ratings yet
Factors Influency The Design of Large High-Voltage Power Transformers 2ProceedingsofJan1968
28 pages
Whitepaper Definitive Guide To Enterprise Container Platforms
No ratings yet
Whitepaper Definitive Guide To Enterprise Container Platforms
15 pages
Catalog of Leangle Products 2024
No ratings yet
Catalog of Leangle Products 2024
17 pages
Prace Autumn School 2013 Ponzini Cineca Sailddeslecture
No ratings yet
Prace Autumn School 2013 Ponzini Cineca Sailddeslecture
20 pages
8Qlyhuvlgdggh1Dyduud: 1Diduurdnr8Qlehuwvlwdwhd
No ratings yet
8Qlyhuvlgdggh1Dyduud: 1Diduurdnr8Qlehuwvlwdwhd
19 pages
How Do I Profile C++ Code Running On Linux - Stack Overflow
No ratings yet
How Do I Profile C++ Code Running On Linux - Stack Overflow
30 pages
Module-1: Chapter-1 Parallel Computer Models
No ratings yet
Module-1: Chapter-1 Parallel Computer Models
42 pages
Python GPU Acceleration Webinar
No ratings yet
Python GPU Acceleration Webinar
33 pages
W2 Advanced Data Structures, IO & Control
No ratings yet
W2 Advanced Data Structures, IO & Control
44 pages
Computer Networks: Textbook
No ratings yet
Computer Networks: Textbook
29 pages
Performance Measurement Tools and Techniques
No ratings yet
Performance Measurement Tools and Techniques
50 pages
r22 1 9 ML Lab Manual r22 Regulations
No ratings yet
r22 1 9 ML Lab Manual r22 Regulations
24 pages
Gas Chromatograph Interface For ROC809 User Program Manual
No ratings yet
Gas Chromatograph Interface For ROC809 User Program Manual
54 pages
Basic Tools in 3D in AutoCAD - 12CAD
No ratings yet
Basic Tools in 3D in AutoCAD - 12CAD
7 pages
Efficient SPM Utilization in Embedded Systems
No ratings yet
Efficient SPM Utilization in Embedded Systems
54 pages
CS 294-73 Software Engineering For Scientific Computing Lecture 14: Development For Performance
No ratings yet
CS 294-73 Software Engineering For Scientific Computing Lecture 14: Development For Performance
40 pages
Lec02 1 Measuring Profiling
No ratings yet
Lec02 1 Measuring Profiling
25 pages
TSW
No ratings yet
TSW
15 pages
Business Report
No ratings yet
Business Report
30 pages
Analysis of Ac Contactors Combining Electric Circuits, Time-Harmonic Finite Element Simulations and Experimental Work
No ratings yet
Analysis of Ac Contactors Combining Electric Circuits, Time-Harmonic Finite Element Simulations and Experimental Work
14 pages
Keywords: Powerful Electrical Transformer, Losses, Heating, Three-Dimensional Modeling
No ratings yet
Keywords: Powerful Electrical Transformer, Losses, Heating, Three-Dimensional Modeling
9 pages
Predictive Modelling
No ratings yet
Predictive Modelling
28 pages
23.profiling I
No ratings yet
23.profiling I
29 pages
Ai&Ml Lab Record Final
No ratings yet
Ai&Ml Lab Record Final
31 pages
Writing Fast Matlab Code PDF
No ratings yet
Writing Fast Matlab Code PDF
29 pages
Main PART PDF
No ratings yet
Main PART PDF
46 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
Plagiarism
No ratings yet
Plagiarism
17 pages
Computer Systems Benchmarking Guide
No ratings yet
Computer Systems Benchmarking Guide
9 pages
SB2500 SB3000 14 BE2906 Inverter Datasheet
No ratings yet
SB2500 SB3000 14 BE2906 Inverter Datasheet
2 pages
SIDM FinalProject
No ratings yet
SIDM FinalProject
16 pages
Ford Otosan H10 Soundproof Specs
No ratings yet
Ford Otosan H10 Soundproof Specs
33 pages
Sample Paper Syllabus 2019-20: Class
No ratings yet
Sample Paper Syllabus 2019-20: Class
2 pages
Power Transformer
No ratings yet
Power Transformer
5 pages
Cátedra de Análisis Numérico: Guia de Laboratorio #6
No ratings yet
Cátedra de Análisis Numérico: Guia de Laboratorio #6
5 pages
MP2 Rev 3
No ratings yet
MP2 Rev 3
17 pages
Surge Protection Standards Guide
No ratings yet
Surge Protection Standards Guide
1 page
Roofline: An Insightful Visual Performance Model For Floating-Point Programs and Multicore Architectures
No ratings yet
Roofline: An Insightful Visual Performance Model For Floating-Point Programs and Multicore Architectures
10 pages
EV&T July2013 Article
No ratings yet
EV&T July2013 Article
1 page
Plagiarism
No ratings yet
Plagiarism
18 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Benchmarking and CPI Calculation
No ratings yet
Benchmarking and CPI Calculation
3 pages
Tender Submission Rev 2
No ratings yet
Tender Submission Rev 2
11 pages
Comunicare Seriala
No ratings yet
Comunicare Seriala
23 pages
Profiler For Method Level Stable Measurements
No ratings yet
Profiler For Method Level Stable Measurements
11 pages
The Design and Development of A Dynamic Program Measurement Tool For The Intel 8086/88 Behavior
No ratings yet
The Design and Development of A Dynamic Program Measurement Tool For The Intel 8086/88 Behavior
13 pages
Problem Project 1
No ratings yet
Problem Project 1
4 pages
Per Flab
No ratings yet
Per Flab
7 pages
CO472 - A0 - Pin, Valgrind, Perf, Gprof
No ratings yet
CO472 - A0 - Pin, Valgrind, Perf, Gprof
3 pages
LabPractice 1 EfficientProgramming
No ratings yet
LabPractice 1 EfficientProgramming
8 pages
Data Structures Lab for CS Students
No ratings yet
Data Structures Lab for CS Students
6 pages
Embedded System Assignment
No ratings yet
Embedded System Assignment
14 pages
PLUS+1 Generic Dual Path Subsystem: Application Software
No ratings yet
PLUS+1 Generic Dual Path Subsystem: Application Software
2 pages
PL01 Guiao
No ratings yet
PL01 Guiao
3 pages
Messi World Cup 2022 4K Wallpaper
No ratings yet
Messi World Cup 2022 4K Wallpaper
1 page
Dell Inspiron 1764 Drivers For Windows 7 64bit
No ratings yet
Dell Inspiron 1764 Drivers For Windows 7 64bit
3 pages
t2 Ammp 8
No ratings yet
t2 Ammp 8
5 pages
Module 3
No ratings yet
Module 3
23 pages
Assignment 2-3
No ratings yet
Assignment 2-3
5 pages
More User Manuals On
No ratings yet
More User Manuals On
2 pages
Cisc Risc
No ratings yet
Cisc Risc
4 pages
Research Statement
No ratings yet
Research Statement
2 pages
Exercise Sheet 10 Solution
No ratings yet
Exercise Sheet 10 Solution
3 pages
Khem Raj Embedded Linux Conference 2014, San Jose, CA
No ratings yet
Khem Raj Embedded Linux Conference 2014, San Jose, CA
29 pages
Predictive Modeling Guide
No ratings yet
Predictive Modeling Guide
29 pages
14 Tools: 14.1 The "Code Coverage" Tool
No ratings yet
14 Tools: 14.1 The "Code Coverage" Tool
6 pages
Lec 3
No ratings yet
Lec 3
20 pages
ACA UNit 1
No ratings yet
ACA UNit 1
29 pages
Performance: Latency
No ratings yet
Performance: Latency
7 pages
Writing Efficient R Code
No ratings yet
Writing Efficient R Code
5 pages
12 Profiling
No ratings yet
12 Profiling
52 pages
ML Assigment 1
No ratings yet
ML Assigment 1
6 pages
Tuning Programs With Oprofi Le
No ratings yet
Tuning Programs With Oprofi Le
10 pages

Cray Debugging and Profiling Tools Guide

Uploaded by

Cray Debugging and Profiling Tools Guide

Uploaded by

Short Introduction to

! Why performance analysis?

! Assist the user with application performance

! Basic functionalities apply for all the compilers on

! Requires no source code or Makefile modification

! Provide automatic application performance statistics at

Table 1: Profile by Function Group and Function (top 10 functions shown)

Samp% | Samp | Imb. | Imb. |Group

100.0% | 1715.7 | -- | -- |Total

Table 2: Profile by Group, Function, and Line

LU, class E, 2048 MPI processes

! Better MPI mapping topology detected

There appears to be point-to-point MPI communication in a 32 X 64

A file named MPICH_RANK_ORDER.Grid was generated along with this

Rank Order On-Node On-Node MPICH_RANK_REORDER_METHOD

Custom 5.981e+12 84.20% 3

! A file entitled MPICH_RANK_ORDER.Grid has been

! Compile your code with Cray compiler for using the

KAUST King Abdullah University of Science and Technology

KAUST King Abdullah University of Science and Technology

KAUST King Abdullah University of Science and Technology

ftn -g -o exec exec.f

! Other tools are available such as Totalview, DDT

! Perftools-lite is straight forward for a new user

! Potential to port code from a serial or MPI version to

! Get advantage of the tools

You might also like

!  Why performance analysis?

!  Assist the user with application performance

!  Basic functionalities apply for all the compilers on

!  Requires no source code or Makefile modification

!  Provide automatic application performance statistics at

!  Better MPI mapping topology detected

!  A file entitled MPICH_RANK_ORDER.Grid has been

!  Compile your code with Cray compiler for using the

!  Other tools are available such as Totalview, DDT

!  Perftools-lite is straight forward for a new user

!  Potential to port code from a serial or MPI version to

!  Get advantage of the tools