PGPROF Graphical Performance Profiler


Performance Profile Parallel MPI and OpenMP Applications

PGPROF® is a powerful and simple-to-use interactive postmortem statistical analyzer for MPI-parallel and OpenMP thread- parallel programs. Use PGPROF to visualize and diagnose the performance of the components of your program. Using tables and graphs, PGPROF associates execution time with the source code and instructions of your program, allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, PGPROF also provides features for helping you to understand why certain parts of your program have high execution times.

PGPROF complements PGI's powerful MPI and OpenMP parallel graphical debugger PGDBG®.


Use PGPROF to analyze programs on multicore SMP Servers, distributed-memory clusters and hybrid clusters where each node contains multicore x64 processors. PGPROF can profile parallel programs, including multiprocess MPI programs, multi-threaded OpenMP programs, or a combination of both. PGPROF allows profiling at the function, source code line, and assembly instruction level for PGI-compiled Fortran, C and C++ programs. PGPROF provides views of the performance data for analysis of MPI communication, multiprocess and multi-thread load balancing, and scalability.

Using the Common Compiler Feedback Format (CCFF), PGI compilers save information about how your program was optimized, or why a particular optimization was not made. PGPROF can extract this information and associate it with source code and other performance data, allowing you to view all of this information simultaneously. PGPROF also supports a feedbackonly mode, which allows you to browse compiler feedback associated with a CCFF-enabled binary executable in the absence of a performance profile.

Watch the PGPROF New Features video (time: 7:50)

View full size image

PGPROF provides the information required to determine which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF will enable you to maximize vectorization and performance on a single x64 processor core. PGPROF exposes performance bottlenecks in a cluster application by presenting the number of calls, aggregate message size and execution time of individual MPI function calls on a line by line basis.


View full size image

In the figure above, the 'Scale' column shows that some functions like f_nonbon scale at about 1/2 linear speedup, while others like a_next slow down when run with an increased number of threads. The 'Parallelism' table below shows that execution of mm_fv_update_nonbon is not perfectly balanced between threads, with thread zero spending 33% of the time in that routine, but thread three spending only 20%.

Using PGPROF, you can merge profiles from multiple runs on different numbers of nodes to perform scalability analysis on your MPI or OpenMP application at the application, function or line level. Scalability analysis allows you to quickly see which parts of your application are barriers to scalable performance, and where your parallel tuning efforts should be focused. PGPROF, displays information in easy-to-use formats such as bar-charts, percentages, counts or seconds and displays profiles using graphical histograms.

Performance data from your application can be collected in a number of ways. Use the pgcollect tool for basic execution-time profiling. For more specialize needs, PGPROF supports Instrumentation-based profiling and Sample-based profiling including time-based sampling and event-based sampling using hardware counters.

Powerful GUI
Analyzing a parallel application can be extremely challenging. PGPROF provides a comprehensive set of graphical user interface (GUI) elements to assist. The PGPROF GUI displays information in familiar easy-to-use formats such as bar-charts, percentages, counts or seconds. PGPROF also supports visualizing a profile using graphical histograms.

With PGPROF, quickly determine where execution time is spent and see which functions were called and how often. Use the PGPROF to quickly analyze MPI Sends, MPI Receives and other MPI communication. Information on time spent in thread-parallel regions is also readily accessible. PGPROF supports function, instruction and source-line level profiling. PRGPROF can even be used to effectively profile optimize code at the block level using PGI's unique instrumentation or a sample-based gprof style methodology. PGPROF's scalability comparison feature using hardware counters on Linux provides a reliable low overhead means to measure linear speed-up or slow-down between multiple executions of an application.