Profiling techniques on SDSC systemsDmitry PekurovskySDSC Summer InstituteJuly 16, 2007: Profiling techniques on SDSC systems Dmitry Pekurovsky SDSC Summer Institute July 16, 2007
Overview of Talk: Overview of Talk Standard profiling using prof, gprof and gmon (DataStar, Blue Gene)
IBM High Performance Computing Toolkit (IHPCT) on DataStar and Blue Gene
Hardware Performance Monitoring – HPM
MPI Tracer/Profiler
Xprofiler - CPU profiling tool
PeekPerf – Visualization of performance trace information
Integrated Performance Monitoring (IPM) on DataStar and Blue Gene
Suggested standard tuning procedure
Standard Profiling using prof, gprof : Standard Profiling using prof, gprof Standard profiling (prof, gprof) is available on both DataStar and Blue Gene.
Three levels of profiling are available with gmon, depending on the –pg and –g options on the compile and link commands
Timer tick profiling information: Add –pg to the link options
Procedure level profiling with timer tick info: Add –pg to compile and link options
Full profiling – call graph info, statement level profiling, basic block profiling, and machine instruction profiling: Add –pg –g to the compile and link options
Each task generates a gmon.out.x file where x corresponds to the rank of the task.
Output can be read using the gprof command (and Xprofiler as detailed later in the talk)
Available tools: Available tools
Example of profiling using gmon: Example of profiling using gmon Step 1: Compile using the –pg and –g options:
mpxlf –pg –g pois-imp.f –o example1
Step 2: Run the code to produce the gmon.out.x files:
ds100 % ls gmon.out.*
gmon.out.0 gmon.out.1 gmon.out.2 gmon.out.3
Step 3: Use gprof to analyze the output:
gprof -s example1 gmon.out.0 gmon.out.1 gmon.out.2 gmon.out.3
gprof example1 gmon.sum andgt; summary.dat
(The first command produces gmon.sum which is analyzed in the second line and the ouput redirected to summary.dat)
Example of profiling using gmon: Example of profiling using gmon gprof output has the call graph info, flat profile and function index
Section of sample call graph:
called/total parents
index %time self descendents called+self name index
called/total children
0.00 4.13 4/4 .__start [3]
[1] 82.8 0.00 4.13 4 .poisimp [1]
4.10 0.03 4/4 .poisson [2]
-------------------------------------------------------------------
4.10 0.03 4/4 .poisimp [1]
[2] 82.8 4.10 0.03 4 .poisson [2]
0.02 0.00 32000/32000 .solve [12]
0.01 0.00 84032/84032 ._sin [26]
------------------------------------------------------------------
Example of profiling using gmon: Example of profiling using gmon Section of sample flat profile andamp; function index
% cumulative self self total
time seconds seconds calls ms/call ms/call name
82.2 4.10 4.10 4 1025.00 1032.50 .poisson [2]
1.2 4.16 0.06 uitrunc_const [4]
1.0 4.21 0.05 .lapi_recv_vec [5]
1.0 4.26 0.05 .shm_submit_slot [6]
0.8 4.30 0.04 ._Vector_dgsp_xfer [7]
0.8 4.34 0.04 .mpci_send [8]
0.6 4.37 0.03 ._lapi_shm_get [9]
0.6 4.40 0.03 ._mpi_allgather [10]
0.6 4.43 0.03 .allgather_tree_b [11]
0.4 4.45 0.02 32000 0.00 0.00 .solve [12]
...
...
Index by function name
[27] .LAPI_Util.GL [9] ._lapi_shm_get [47] .atoi
[28] .MPI(int) [40] ._lapi_shm_setup [48] .fast_free
[29] .MPID_Welcome_EA [41] ._mem_alloc [49] .fclose_unlocked
[13] .MPID_msg_arrived [10] ._mpi_allgather [50] .fflush_unlocked
[30] .MPI__Allgather [17] ._null_hndlr [51] .free.GL
Xprofiler: Xprofiler CPU profiling tool similar to gprof
Can be used to profile both serial and parallel applications
Use procedure-profiling information to construct a graphical display of the functions within an application
Provide quick access to the profiled data and helps users identify functions that are the most CPU-intensive
Based on sampling (support from both compiler and kernel)
Charge execution time to source lines and show disassembly code
Xprofiler is in your default path on Datastar
Running Xprofiler: Running Xprofiler Compile the program with –g -pg
Run the program
gmon.out file is generated (MPI applications generate gmon.out.1, …, gmon.out.n)
On datastar: xprofiler a.out gmon.*
Run Xprofiler
Xprofiler: Main Display: Xprofiler: Main Display Width of a bar: time including called routines
Height of a bar: time excluding called routines
Call arrows labeled with number of calls
Overview window for easy navigation (View Overview)
Xprofiler - Disassembler Code: Xprofiler - Disassembler Code
HPMCOUNT : HPMCOUNT hpmcount runs an application and then reports execution wall clock time, hardware performance counter information, derived hardware metrics, and resource utilization statistics (such as memory usage!) .
Usage
Serial Job: hpmcount executable_name
Parallel jobs: poe hpmcount executable_name andlt;poe optionsandgt;
For parallel jobs you can use the above poe line in a batch script.
Note that there is hpm output for each task when you run this in parallel. Set MP_LABELIO=yes to identify the task ID of the output
HPMCOUNT (cont.): HPMCOUNT (cont.)
HPM event groups: HPM event groups
HPMCOUNT output example: HPMCOUNT output example
LIBHPM: LIBHPM Instrumentation library
Provides performance information for instrumented program sections
Supports multiple (nested) instrumentation sections
Multiple sections may have the same ID
Run-time performance information collection
Available on Datastar and Blue Gene
Functions: Functions hpmInit( taskID, progName ) / f_hpminit( taskID, progName )
taskID is an integer value indicating the node ID.
progName is a string with the program name.
hpmStart( instID, label ) / f_hpmstart( instID, label )
instID is the instrumented section ID. It should be andgt; 0 and andlt;= 100 ( can be overridden)
Label is a string containing a label, which is displayed by PeekPerf.
hpmStop( instID ) / f_hpmstop( instID )
For each call to hpmStart, there should be a corresponding call to hpmStop with matching instID
hpmTerminate( taskID ) / f_hpmterminate( taskID )
This function will generate the output. If the program exits without calling hpmTerminate, no performance information will be generated.
Message-Passing Performance:: Message-Passing Performance: MP_Profiler Library
Captures 'summary' data for MPI calls
Source code traceback
User MUST call MPI_Finalize() in order to get output files.
No changes to source code
MUST compile with –g to obtain source line number information
MP_Tracer Library
Captures 'timestamped' data for MPI calls
Source traceback
Available on Datastar and Blue Gene
Trace flags: Trace flags Datastar:
IHPCT_BASE=/usr/local/apps/ihpct
TRACELIB=$(IHPCT_BASE)/lib
MPITRACE = -L$(TRACELIB) -lmpitrace
MPIPROF = -L$(TRACELIB) –lmpiprof
HPMINC=$(IHPCT_BASE)/include
HPMLIB = -L$(IHPCT_BASE)/lib/pwr4 -lhpm_r -lpmapi -lm
Blue Gene (new - untested)
IHPCT_BASE = /usr/local/apps/hpc_toolkit
TRACELIB=$(IHPCT_BASE)/lib
MPITRACE = -L$(TRACELIB) –lmpitrace_f ( OR -lmpitrace_c)
HPMINC=$(IHPCT_BASE)/include
HPMLIB = -L$(IHPCT_BASE)/lib –lhpm.rts -lpmapi -lm
MP_Profiler Summary Output: MP_Profiler Summary Output
MP_Profiler Sample Call Graph Output: MP_Profiler Sample Call Graph Output
MP_Profiler Message Size Distribution: MP_Profiler Message Size Distribution
Environment Flags: Environment Flags TRACELEVEL
Level of trace back the caller in the stack
Used to skipped wrappers
Default: 0
TRACE_TEXTONLY
If set to '1', plain text output is generated
Otherwise, a viz file is generated
TRACE_PERFILE
If set to '1', the output is shown for each source file
Otherwise, output is a summary of all source files
TRACE_PERSIZE
If set to '1', the statistics for a function is shown for every message size
Otherwise, summary for all message sizes is given
PeekPerf: PeekPerf PeekPerf is a viewer for data generated by HPM, Tracer and Profiling libraries, and DPOMP.
Integrated Performance Monitoring (IPM) : Integrated Performance Monitoring (IPM) Allows users to obtain a concise summary of the performance and communication characteristics of their codes.
Information on use available at http://www.sdsc.edu/us/tools/top/ipm
On Blue Gene you need to recompile your code, linking to the IPM library by adding
-L/usr/local/apps/ipm/lib/ -lipm
to the link stage. For example:
C: mpcc main.c -L/usr/local/apps/ipm/lib/ -lipm
Fortran: mpxlf90 main.f -L/usr/local/apps/ipm/lib/ -lipm
Run your job using poe-ipm on DataStar and mpirun-ipm on Blue Gene.
DO NOT use together with HPMCOUNT !
Slide26:
IPM Output: IPM Output In addition to summary, an in-depth analysis is available, including:
Load balancing
Communication pattern topology
Message size distribution
A file will be produced with a name combining your username and a number generated by IPM (for example mahidhar.1160615104.920400.0)
To generate a Web page showing detailed analysis of your code, run the ipm_parse_sdsc command followed by the filename.
bg-login1 0512/RUN1andgt; /usr/local/apps/ipm/bin/ipm_parse_sdsc mahidhar.1160615104.920400.0
IPM at SDSC - Webpage creation in progress
Please wait - this may take several minutes.
100..200..300..400..500..
IPM: Data processing finished - Creating HTML output - please wait.
The web page will be visible at:
http://www.sdsc.edu/us/tools/top/ipm/output/bgsn.14860.0
Note the webpage will stay online for 30 days
It can be regenerated at any time,
or a local copy can be saved using your web browser
IPM results: Webpage snapshot: IPM results: Webpage snapshot
Slide29:
Standard Tuning Procedure: Standard Tuning Procedure Pick suitable dataset (a good representation of your production runs) and optimal processor set
Get rough estimate FLOPS. Running with hpmcount or IPM (on DataStar) is the quickest way to do this.
5-15% of peak is normal range
Understand scaling problems by running at different processor count
Single processor performance profiling:
gprof, Xprofiler,HPM – identify routines or regions that dominate execution time
Consider creating a simple kernel that manifests the same behavior – ease of testing
HPM – study CPU performance in detail (cache use etc)
MPI profiling: MPI profiling
References: References DataStar user guide
http://www.sdsc.edu/us/resources/datastar/
Blue Gene user guide
http://www.sdsc.edu/us/resources/bluegene
IBM HPC Toolkit Link
https://domino.research.ibm.com/comm/research_projects.nsf/pages/actc.index.html
IPM
http://www.sdsc.edu/us/tools/top/ipm