Massively parallel Fourier transforms are widely used in computational sciences, and specifically in computational fluid dynamics which involves unbounded Poisson problems. In practice the latter is usually the most time-consuming operation due to its inescapable all-to-all communication pattern. The original flups library tackles that issue with an implementation of the distributed Fourier transform tailor-made for successive resolutions of unbounded Poisson problems. However the proposed implementation lacks of flexibility as it only supports cell-centered data layout and features a plain communication strategy. This work extends the library along two directions. First, flups implementation is generalized to support a node-centered data layout. Second, three distinct approaches are provided to handle the communications: one all-to-all, and two non-blocking implementations relying on manual packing and MPI_Datatype to communicate over the network. The proposed software is validated against analytical solutions for unbounded, semi-unbounded, and periodic domains. The performance of the approaches is then compared against accFFT, another distributed FFT implementation, using a periodic case. Finally the performance metrics of each implementation are analyzed and detailed on various top-tier European infrastructures up to 49,152 cores. This work brings flups up to a fully production-ready and performant distributed FFT library, featuring all the possible types of FFTs and with flexibility in the data-layout. The code is available under a BSD-3 license at github.com/vortexlab-uclouvain/flups.
翻译:大量平行的 Fourier 变换在计算科学中广泛使用,具体而言,在涉及无限制的Poisson问题的计算流体动态中使用。在实践上,后者通常是最费时的操作,因为其无法避免的全到所有通信模式。原始的排风管库通过实施分布式 Fourier 变换为无限制的Poisson问题的连续解析而解决这一问题。但是,拟议的实施缺乏灵活性,因为它只支持以细胞为中心的数据布局,并具有简单的通信战略。这项工作将图书馆沿着两个方向扩展。首先,执行排风管是通用的,以支持节点-3数据布局。第二,提供了三种不同的处理通信操作方法:一个全对全,两个无阻的安装使用手打包和MPI_Datatyd在网络上进行沟通。提议的软件根据无约束、半无约束和定期域域域的分析解决方案进行验证。然后将方法的绩效与一个CFFFFT相比,另一个传播FFFT的实施工作,使用一个定期案例。最后,每个FFI 运行的进度图图将全部分析,所有FI-sleval 和FFleval 数据类型在最上,所有的运行中,所有FFI-s-s-s-s-s-s-laxxxxxxxxxxxxx