利用TestSNAP和LAMPS快速探索先进建筑优化战略 (Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS)

The exascale race is at an end with the announcement of the Aurora and Frontier machines. This next generation of supercomputers utilize diverse hardware architectures to achieve their compute performance, providing an added onus on the performance portability of applications. An expanding fragmentation of programming models would provide a compounding optimization challenge were it not for the evolution of performance-portable frameworks, providing unified models for mapping abstract hierarchies of parallelism to diverse architectures. A solution to this challenge is the evolution of performance-portable frameworks, providing unified models for mapping abstract hierarchies of parallelism to diverse architectures. Kokkos is one such performance portable programming model for C++ applications, providing back-end implementations for each major HPC platform. Even with a performance portable framework, restructuring algorithms to expose higher degrees of parallelism is non-trivial. The Spectral Neighbor Analysis Potential (SNAP) is a machine-learned inter-atomic potential utilized in cutting-edge molecular dynamics simulations. Previous implementations of the SNAP calculation showed a downward trend in their performance relative to peak on newer-generation CPUs and low performance on GPUs. In this paper we describe the restructuring and optimization of SNAP as implemented in the Kokkos CUDA backend of the LAMMPS molecular dynamics package, benchmarked on NVIDIA GPUs. We identify novel patterns of hierarchical parallelism, facilitating a minimization of memory access overheads and pushing the implementation into a compute-saturated regime. Our implementation via Kokkos enables recompile-and-run efficiency on upcoming architectures. We find a $\sim$22x time-to-solution improvement relative to an existing implementation as measured on an NVIDIA Tesla V100-16GB for an important benchmark.

翻译：地标竞赛随着Auror和Frontier机器的宣布而结束。下一代超级计算机的下一代利用多种硬件结构来计算其性能,为应用程序的可移动性提供了额外的负担。编程模型的日益分散将提供一个复合优化挑战,如果不是对性能便携式框架的演变而言,它将为绘制与不同结构平行的抽象结构结构的抽象分级结构提供统一的模型。应对这一挑战的解决方案是性能便携式框架的演进,为绘制与不同结构平行的抽象结构结构的抽象结构结构提供了统一的模型。 Kokkos是C++应用程序的性能移动式编程编程模型,为每个主要的HPC平台提供后端执行。即使是一个可操作性框架,但为暴露更高水平平行模式的平行框架的演进提供了复杂的优化。 Spectral Nigbor 分析潜力(SNA) 是一种机械化的内分解的内分解潜力,用于尖端的分子动态模拟。 SNational A 将Scialal- realalalal-alalalalal- imal- imal imal imal imal imal imstrual 实施一个我们在SIMUDA IMA IMUDA IMBLA 上运行的升级的升级的升级的升级的升级的升级的升级的运行的升级的升级的升级的升级和低级的运行算算进。