We present an efficient approach for writing architecture-agnostic parallel high-performance stencil computations in Julia, which is instantiated in the package ParallelStencil.jl. Powerful metaprogramming, costless abstractions and multiple dispatch enable writing a single code that is suitable for both productive prototyping on a single CPU thread and production runs on multi-GPU or CPU workstations or supercomputers. We demonstrate performance close to the theoretical upper bound on GPUs for a 3-D heat diffusion solver, which is a massive improvement over reachable performance with CUDA.jl Array programming.
翻译:我们提出了一个高效的方法,用于在Julia中写出建筑-不可知平行高性能高性能电极计算,该计算在X线Stenciil.jl.的包件中即刻出现。 强大的元程序绘制、无成本的抽象和多次发送使得能够写出一个单一代码,既适合单个CPU线上的生产性原型,又适合在多GPU或CPU工作站或超级计算机上进行生产。我们展示了接近3D热扩散求解器的GPU的理论上限的性能,这比CUDA.jl 阵列编程的可达性能大有改进。