Loop scheduling techniques aim to achieve load-balanced executions of scientific applications. Dynamic loop self-scheduling (DLS) libraries for distributed-memory systems are typically MPI-based and employ a centralized chunk calculation approach (CCA) to assign variably-sized chunks of loop iterations. We present a distributed chunk calculation approach (DCA) that supports various types of DLS techniques. Using both CCA and DCA, twelve DLS techniques are implemented and evaluated in different CPU slowdown scenarios. The results show that the DLS techniques implemented using DCA outperform their corresponding ones implemented with CCA, especially in extreme system slowdown scenarios.
翻译:用于分布式模拟系统的动态循环自排(DLS)图书馆通常以MPI为基础,采用集中块计算法来分配不同大小的循环迭代块。我们提出了一个分布块计算法,支持各种类型的DLS技术。使用CCA和DCA,在不同CPU减速情况下实施和评估了12种DLS技术。结果显示,使用DCA实施的DLS技术优于与CCA执行的相应技术,特别是在极端系统减速情况下。