The Fujitsu A64FX ARM-based processor is used in supercomputers such as Fugaku in Japan and Isambard 2 in the UK and provides an interesting combination of hardware features such as Scalable Vector Extension (SVE), and native support for reduced-precision floating-point arithmetic. The goal of this paper is to explore performance of the Julia programming language on the A64FX processor, with a particular focus on reduced precision. Here, we present a performance study on axpy to verify the compilation pipeline, demonstrating that Julia can match the performance of tuned libraries. Additionally, we investigate Message Passing Interface (MPI) scalability and throughput analysis on Fugaku showing next to no significant overheads of Julia of its MPI interface. To explore the usability of Julia to target various floating-point precisions, we present results of ShallowWaters.jl, a shallow water model that can be executed a various levels of precision. Even for such complex applications, Julia's type-flexible programming paradigm offers both, productivity and performance.
翻译:以 Fujitsu A64FX ARM 为基础的处理器用于日本的Fugaku 和英国的Isambard 2等超级计算机,它提供了诸如可缩放矢量扩展(SVE)等硬件特征的有趣组合,以及本地对降低精度浮点算的支持。本文的目的是探索朱丽亚在 A64FX 处理器上的程序设计语言的性能,特别侧重于降低精确度。这里,我们介绍了用于核查编译管道的轴心的性研究,表明Julia可以与调阅图书馆的性能相匹配。此外,我们还调查了Fugaku 的信息传输界面(MPI) 的可缩放性和吞吐量分析, 显示其MPI 界面的Julia没有显著的顶部。为了探索朱丽亚对各种浮点精确度目标的可用性,我们介绍了Shalow Waters.jl 的结果, 这个浅水模型可以执行不同程度的精确度。即使如此复杂, Julia 的型灵活编程模式也可以提供, 生产率和性。