Heterogeneous computing systems provide high performance and energy efficiency. However, to optimally utilize such systems, solutions that distribute the work across host CPUs and accelerating devices are needed. In this paper, we present a performance and energy aware approach that combines AI planning heuristics for parameter space exploration with a machine learning model for performance and energy evaluation to determine a near-optimal system configuration. For data-parallel applications our approach determines a near-optimal host-device distribution of work, number of processing units required and the corresponding scheduling strategy. We evaluate our approach for various heterogeneous systems accelerated with GPU or the Intel Xeon Phi. The experimental results demonstrate that our approach finds a near-optimal system configuration by evaluating only about 7% of reasonable configurations. Furthermore, the performance per Joule estimation of system configurations using our machine learning model is more than 1000x faster compared to the system evaluation by program execution.
翻译:不同的计算系统可以提供高性能和高能效。 但是,为了优化利用这些系统,需要找到将工作分布在主机CPU和加速装置之间的解决方案。 在本文中,我们展示了一种有性能和能源意识的方法,将参数空间探索的AI规划超常与一个用于性能和能源评估的机器学习模型结合起来,以确定接近最佳的系统配置。对于数据平行应用,我们的方法决定了一种接近最佳的主机设备的工作分配、所需处理器数量和相应的时间安排战略。我们评估了我们采用GPU或Intel Xeon Phi加速的各种混合系统的方法。实验结果表明,我们的方法通过仅评估7%的合理配置而发现一种接近最佳的系统配置。此外,与通过程序执行对系统进行的系统评价相比,使用我们的机器学习模型对系统配置的每个Joule估计的性能超过1 000x。