可运行时自适应选择性性能检测 (Runtime-Adaptable Selective Performance Instrumentation)

from arxiv, To be published in the proceedings of the 28th International Workshop on High-Level Parallel Programming Models and Supportive Environments

Automated code instrumentation, i.e. the insertion of measurement hooks into a target application by the compiler, is an established technique for collecting reliable, fine-grained performance data. The set of functions to instrument has to be selected with care, as instrumenting every available function typically yields too large a runtime overhead, thus skewing the measurement. No "one-suits-all" selection mechanism exists, since the instrumentation decision is dependent on the measurement objective, the limit for tolerable runtime overhead and peculiarities of the target application. The Compiler-assisted Performance Instrumentation (CaPI) tool assists in creating such instrumentation configurations, by enabling the user to combine different selection mechanisms as part of a configurable selection pipeline, operating on a statically constructed whole-program call-graph. Previously, CaPI relied on a static instrumentation workflow which made the process of refining the initial selection quite cumbersome for large-scale codes, as the application had to be recompiled after each adjustment. In this work, we present new runtime-adaptable instrumentation capabilities for CaPI which do not require recompilation when instrumentation changes are made. To this end, the XRay instrumentation feature of the LLVM compiler was extended to support the instrumentation of shared dynamic objects. An XRay-compatible runtime system was added to CaPI that instruments selected functions at program start, thereby significantly reducing the required time for selection refinements. Furthermore, an interface to the TALP tool for recording parallel efficiency metrics was implemented, alongside a specialized selection module for creating suitable coarse-grained region instrumentations.

翻译：自动代码检测，即通过编译器将测量挂钩插入目标应用程序是一种收集可靠、细粒度性能数据的成熟技术。必须仔细选择要检测的函数集，因为检测每个可用函数通常会产生过大的运行时开销，从而影响测量结果。因此，不存在“一刀切”的选择机制，因为检测决策取决于测量目的、可容忍的运行时开销限制和目标应用程序的特殊性。编译器辅助性能检测（CaPI）工具可通过在静态构建的整个程序调用图上集成不同的选择机制作为可配置选择管道的一部分来帮助创建此类检测配置。以前，CaPI依赖于静态检测工作流程，当进行检测更改时，必须重新编译应用程序，因此对于大型代码，该过程变得很麻烦。在本文中，我们为CaPI提供了新的可运行时自适应检测功能，可在不需要重新编译的情况下进行检测更改。为此，将LLVM编译器的XRay检测功能扩展以支持共享库对象的检测。增加了适用于CaPI的XRay兼容运行时系统，该系统在程序启动时对选定的函数进行检测，从而大大缩短了选择细化所需的时间。此外，实现了与TALP工具的接口，以记录并行效率指标，同时实施了一种专门用于创建适当粗粒度区域检测的选择模块。