We present a straightforward approach for distributed parallelization of stencil-based xPU applications on a regular staggered grid, which is instantiated in the package ImplicitGlobalGrid.jl. The approach allows to leverage remote direct memory access and enables close to ideal weak scaling of real-world applications on thousands of GPUs. The communication costs can be easily hidden behind computation.
翻译:我们提出了一个直截了当的办法,在常规交错格网格上对基于Stencils的xPU应用程序进行分布平行的分布式平行处理,在Implic GlobalGrid.jl的包件中即时进行。 这种方法能够利用远程直接存储存取,使千千个GPU上真实世界应用程序的缩放接近理想的微弱规模。 通信成本很容易隐藏在计算之后。