This paper describes how we successfully used the HPX programming model to port the DCA++ application on multiple architectures that include POWER9, x86, ARM v8, and NVIDIA GPUs. We describe the lessons we can learn from this experience as well as the benefits of enabling the HPX in the application to improve the CPU threading part of the code, which led to an overall 21% improvement across architectures. We also describe how we used HPX-APEX to raise the level of abstraction to understand performance issues and to identify tasking optimization opportunities in the code, and how these relate to CPU/GPU utilization counters, device memory allocation over time, and CPU kernel-level context switches on a given architecture.
翻译:本文描述了我们如何成功地使用HPX编程模型将DCA++应用程序移植到包括 POWER9, x86, ARM v8, 和 NVIDIA GPUs在内的多个结构上。 我们描述了我们可以从这一经验中汲取的教训,以及使HPX在应用中能够改进代码中CPU线部分的好处,这导致整个结构整体改善21%。 我们还描述了我们如何使用HPX-APEX来提高抽象度,以了解性能问题并确定代码中的任务优化机会,以及这些与CPU/GPU利用计数器、一段时间内设备内存分配和特定结构的CPU内核级上下文开关有何关系。