Massively parallel Fourier transforms are widely used in computational sciences, and specifically in computational fluid dynamics which involves unbounded Poisson problems. In practice the latter is usually the most time-consuming operation due to its inescapable all-to-all communication pattern. The original flups library tackles that issue with an implementation of the distributed Fourier transform tailor-made for successive resolutions of unbounded Poisson problems. However the proposed implementation lacks of flexibility as it only supports cell-centered data layout and features a plain communication strategy. This work extends the library along two directions. First, flups implementation is generalized to support a node-centered data layout. Second, three distinct approaches are provided to handle the communications: one all-to-all, and two non-blocking implementations relying on manual packing and MPI_Datatype to communicate over the network. The proposed software is validated against analytical solutions for unbounded, semi-unbounded, and periodic domains. The performance of the approaches is then compared against accFFT, another distributed FFT implementation, using a periodic case. Finally the performance metrics of each implementation are analyzed and detailed on various top-tier European facilities up to 49,152 cores. This work brings flups up to a fully production-ready and performant distributed FFT library, featuring all the possible types of FFTs and with flexibility in the data-layout. The code is available under a BSD-3 license at github.com/vortexlab-uclouvain/flups.
翻译:大规模并行傅里叶变换广泛应用于计算科学中,特别是在涉及无界泊松问题的计算流体力学中。实际上,由于 unavoidable 的所有对所有通信模式,后者通常是最耗时的操作。原始的FLUPS库通过实现分布式傅里叶变换来解决这个问题,该方法专门针对连续的无界泊松问题设计。但是,所提供的实现缺乏灵活性,因为它只支持基于单元的数据布局,并且具有普通的通信策略。本文在两个方向上扩展了该库:首先,将FLUPS实现通用化以支持基于节点的数据布局。其次,提供了三种不同的方法来处理通信:一种是所有对所有的方法,另外两种则是依靠手动打包和MPI_Datatype在网络上进行通信的非阻塞实现。该软件在无限制、半无限制和周期性域的解析解上进行了验证。随后,使用一个周期性案例将各种实现与另一种分布式FFT实现accFFT进行比较。最后,分析和详细介绍了每种实现在欧洲各个顶级设施上(最高可达49,152个核)的性能指标。本文将FLUPS升级为完全生产级别和高性能的分布式FFT库,包含所有可能类型的FFT,并具有数据布局的灵活性。该代码在github.com/vortexlab-uclouvain/flups上以BSD-3许可证提供。