Parallel iterative applications often suffer from load imbalance, one of the most critical performance degradation factors. Hence, load balancing techniques are used to distribute the workload evenly to maximize performance. A key challenge is to know \textit{when} to use load balancing techniques. In general, this is done through load balancing criteria, which trigger load balancing based on runtime application data and/or user-defined information. In the first part of this paper, we introduce a novel, automatic load balancing criterion derived from a simple mathematical model. In the second part, we propose a branch-and-bound algorithm to find the load balancing iterations that lead to the optimal application performance. This algorithm finds the optimal load balancing scenario in quadratic time while, to the best of our knowledge, this has never been addressed in less than an exponential time. Finally, we compare the performance of the scenarios produced by state-of-the-art load balancing criteria relative to the optimal load balancing scenario in synthetic benchmarks and parallel N-body simulations. In the synthetic benchmarks, we observe that the proposed criterion outperforms the other automatic criteria. In the numerical experiments, we show that our new criterion is, on average, $4.9\%$ faster than state-of-the-art load balancing criteria and can outperform them by up to $17.6\%$. Moreover, we see in the numerical study that the state-of-the-art automatic criteria are at worst $47.4\%$ slower than the optimum and at best $16.5\%$ slower.
翻译:平行迭代应用程序往往受到负载不平衡的影响,这是最关键的性能退化因素之一。 因此,用负负负平衡技术来均衡分配工作量,以便最大限度地提高性能。 一个关键的挑战是如何了解\ textit{} 何时使用负负平衡技术。 一般而言,这是通过负负平衡标准完成的,根据运行时间应用数据和/或用户定义的信息触发负负平衡。 在本文第一部分,我们引入了一种由简单数学模型产生的新的自动负负平衡标准。 在第二部分,我们提出一个分支和约束的算法,以找到导致最佳应用性能的平衡迭代值。 这个算法在四边时间找到最佳的负负平衡情景,而据我们所知,这从未在指数化的时间里得到解决。 最后,我们比较了当前最先进的负载平衡标准与综合基准和平行的N体模拟中最佳负负载平衡假设的性能。 在合成基准中,我们发现拟议的标准优于其他自动标准。 在数字实验中,我们发现我们最优的负重负重标准比正值标准要快。