3D FFTs are used to accelerate MD electrostatic forces computations but are difficult to parallelize due to communications requirements. We present a distributed OpenCL 3D FFT implementation on Intel Stratix 10 FPGAs for grids up to {\boldmath $128^3$}. We use FPGA hardware features such as HBM2 memory and multiple 100 Gbps links to provide scalable memory accesses and communications. Our implementation outperforms GPUs for smaller FFTs, even without distribution. For {\boldmath$32^3$} we achieve 4.4 microseconds on a single FPGA, similar to Anton 1 on 512 nodes. For 8 parallel pipelines (hardware limited), we reach the same performance both locally and distributed, showing that communications are not limiting the performance. Our FFT implementation is designed to be part of the electrostatic force pipeline of a scalable MD engine.
翻译:3D FFT 用于加速MD静电力计算,但由于通信需求而难以平行。我们在 Intel Stratix 10 FPGAs 上展示了一个分布式的 OpenCL 3D FFFT, 电网可达 128美元 310 FPGAs 。 我们使用 FPGA 硬件功能, 如 HBM2 内存和多个 100 Gbps 链接, 提供可缩放的内存和通信。 我们的应用程序在较小FFFT 上优于 GPPS, 即使是不分发的。 对于 liboldmath $32 3$} 我们在一个单 FPGA 上实现了4. 4微秒, 类似于 Anton 1 512 节点。 对于8 平行管道( 硬件有限), 我们达到相同的本地和分布式功能, 表明通信并不限制性能。 我们的FFFT 设计是用于一个可缩放的 MD 引擎的静电力管道的一部分 。