As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application runtime and power/energy for energy efficient application execution, then use this framework to autotune four ECP proxy applications -- XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with a Random Forest surrogate model to effectively search parameter spaces with up to 6 million different configurations on two large-scale production systems, Theta at Argonne National Laboratory and Summit at Oak Ridge National Laboratory. The experimental results show that our autotuning framework at large scales has low overhead and achieves good scalability. Using the proposed autotuning framework to identify the best configurations, we achieve up to 91.59% performance improvement, up to 21.2% energy savings, and up to 37.84% EDP improvement on up to 4,096 nodes.
翻译:随着我们进入异构超算时代,效率地使用功耗并在功耗与能量的限制下优化科学应用程序的性能变得至关重要和具有挑战性。我们提出了一种低开销的自动调优框架,用于在大规模条件下为各种混合MPI / OpenMP科学应用程序进行性能和能量自动调优,并对能量效率应用执行中应用程序运行时和功耗/能量之间进行权衡的探索,然后使用该框架对四个ECP代理应用程序进行自动调优 - XSBench,AMG,SWFFT和SW4lite。我们的方法使用带有随机森林代理模型的贝叶斯优化来有效搜索由两个大规模生产系统Theta在Argonne National Laboratory和Summit在Oak Ridge National Laboratory组成的多达6百万个不同配置的参数空间。实验结果表明,我们的大规模自动调优框架具有低开销和良好的可扩展性。使用所提出的自动调优框架来识别最佳配置,我们实现了高达91.59%的性能提升,高达21.2%的节能,以及高达37.84%的EDP提升,共计4,096个节点。