We study the implementation of the even-odd Wilson fermion matrix for lattice QCD simulations on the A64FX architecture. Efficient coding of the stencil operation is investigated for two-dimensional packing to SIMD vectors. We measure the sustained performance on the supercomputer Fugaku at RIKEN R-CCS and show the profiler result of our code, which may signal an unexpected source of slow-down in addition to the detailed efficiency of each part of the code.
翻译:我们研究在A64FX 结构上为拉蒂斯 QCD 模拟而采用甚至奇特的威尔逊 fermion 矩阵的实施情况;调查对SIMD矢量进行二维包装的Stencils操作的有效编码;我们测量在RIKEN R-CCS 的超级计算机Fugaku上的持续性能,并显示我们的代码的剖析结果,它可能预示出一个除代码每个部分的详细效率之外放慢速度的出人意料的来源。</s>