In this paper, we address the problem of designing a distributed application meant to run both classical and asynchronous iterations. MPI libraries are very popular and widely used in the scientific community, however asynchronous iterative methods raise non-negligible difficulties about the efficient management of communication requests and buffers. Moreover, a convergence detection issue is introduced, which requires the implementation of one of the various state-of-the-art termination methods, which are not necessarily highly reliable for most computational environments. We propose here an MPI-based communication library which handles all these issues in a non-intrusive manner, providing a unique interface for implementing both classical and asynchronous iterations. Few details are highlighted about our approach to achieve best communication rates and ensure accurate convergence detection. Experimental results on two supercomputers confirmed the low overhead communication costs introduced, and the effectiveness of our library.
翻译:在本文中,我们讨论了设计一个分布式应用软件以运行古典和非同步迭代的问题;MPI图书馆非常受欢迎,在科学界广泛使用,然而,非同步迭代方法却在有效管理通信请求和缓冲方面造成不可忽略的困难;此外,还提出了趋同探测问题,这需要采用各种最先进的终止方法之一,对于大多数计算环境来说,这些方法不一定非常可靠;我们在此提议一个基于MPI的通信图书馆,以非侵扰方式处理所有这些问题,为采用古典和非同步迭代法提供一个独特的界面;关于我们实现最佳通信率和确保准确检测汇合的方法,没有多少细节得到强调;两台超级计算机的实验结果证实了所引入的低间接费用通信成本和图书馆的有效性。