Heterogeneity has become a mainstream architecture design choice for building High Performance Computing systems. However, heterogeneity poses significant challenges for achieving performance portability of execution. Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters. To address those challenges, this paper proposes new extensions to OpenMP for autonomous, machine learning-driven adaptation. Our solution includes a set of novel language constructs, compiler transformations, and runtime support. We propose a producer-consumer pattern to flexibly define multiple, different variants of OpenMP code regions to enable adaptation. Those regions are transparently profiled at runtime to autonomously learn optimizing machine learning models that dynamically select the fastest variant. Our approach significantly reduces users' efforts of programming adaptive applications on heterogeneous architectures by leveraging machine learning techniques and code generation capabilities of OpenMP compilation. Using a complete reference implementation in Clang/LLVM we evaluate three use-cases of adaptive CPU-GPU execution. Experiments with HPC proxy applications and benchmarks demonstrate that the proposed adaptive OpenMP extensions automatically choose the best performing code variants for various adaptation possibilities, in several different heterogeneous platforms of CPUs and GPUs.
翻译:差异性已成为建设高性能计算系统的主流建筑设计选择。然而,差异性对于实现执行的可移动性提出了巨大的挑战。将一个程序改换到一个新的多元平台是一项艰巨的任务,要求开发者手工探索执行参数的广阔空间。为了应对这些挑战,本文件提议将开放MP的新的扩展用于自主、机械学习驱动的适应。我们的解决方案包括一套新型语言结构、编译器转换和运行时间支持。我们建议了一种生产者-消费者模式,灵活定义开放MP代码区域的多种不同的变体,以便能够适应。这些区域在运行时以透明的方式进行配置,以便自动学习最优化的机器学习模型,从而动态地选择最快的变体。我们的方法大大降低了用户在设计多样性结构适应应用的适应性应用的努力,办法是利用机器学习技术和OpenMP汇编的代码生成能力。我们利用Clan/LLLVM的完整参考实施方法来评估适应性CPU-GPU执行的三种使用案例。与HPC的代理应用和基准实验表明,拟议的适应性开放MP扩展自动选择各种适应可能性的最佳执行代码变体。</s>