To address the challenge of performance analysis on the US DOE's forthcoming exascale supercomputers, Rice University has been extending its HPCToolkit performance tools to support measurement and analysis of GPU-accelerated applications. To help developers understand the performance of accelerated applications as a whole, HPCToolkit's measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To measure GPU-accelerated applications efficiently, HPCToolkit employs a novel wait-free data structure to coordinate monitoring and attribution of GPU performance. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes. Finally, on NVIDIA GPUs, HPCToolkit can derive and attribute a collection of useful performance metrics based on measurements using GPU PC samples. We illustrate HPCToolkit's new capabilities for analyzing GPU-accelerated applications with several codes developed as part of the Exascale Computing Project.
翻译:为了应对美国DOE即将推出的缩略超级计算机的绩效分析挑战, Rice大学一直在扩展其HPCToolkit 性能工具,以支持测量和分析GPU加速应用软件。为了帮助开发者了解整个加速应用的性能, HPCToolkit 的度量和分析工具将量度归属于调用CPU和GPU的调用环境。为了高效测量GPU加速应用, HPCToolkit 使用一个新的无等待数据结构来协调GPU的性能监测和归属。为了帮助开发者理解从高级编程模型中生成的复杂的 GPUPU代码的性能, HPC Toolkit 为GPU计算计算过程的精密近似值。为了支持精细的分析和调校准, HPCToolkit 使用精细的性能分析GPUPLA的性能, 将GPUPU的性能和性能分析整个GVI的性能。