Benchmarking and comparing performance of a scientific simulation across hardware platforms is a complex task. When the simulation in question is constructed with an asynchronous, many-task (AMT) runtime offloading work to GPUs, the task becomes even more complex. In this paper, we discuss the use of a uniquely suited performance measurement library, APEX, to capture the performance behavior of a simulation built on HPX, a highly scalable, distributed AMT runtime. We examine the performance of the astrophysics simulation carried-out by Octo-Tiger on two different supercomputing architectures. We analyze the results of scaling and measurement overheads. In addition, we look in-depth at two similarly configured executions on the two systems to study how architectural differences affect performance and identify opportunities for optimization. As one such opportunity, we optimize the communication for the hydro solver and investigated its performance impact.
翻译:测试和比较跨硬件平台的科学模拟的性能是一项复杂的任务。 当模拟是用一个无同步的、多任务(AMT)运行时间卸载到GPU的工程来构建时,任务就变得更加复杂了。在本文件中,我们讨论了如何使用一个独特的适合性能测量图书馆APEX来捕捉以HPX为基础的模拟的性能行为,HPX是一个高度可缩放的、分布式AMT运行时间。我们审视了由Octo-Tiger在两个不同的超级计算结构上进行的天体物理学模拟的性能。我们分析了测量和测量间接费用的结果。此外,我们深入审视了两个系统上两个类似的配置处决方法,以研究建筑差异如何影响性能,并找出优化的机会。作为这样一个机会,我们优化了水解器的通信,并调查其性能影响。