We push the boundaries of electronic structure-based \textit{ab-initio} molecular dynamics (AIMD) beyond 100 million atoms. This scale is otherwise barely reachable with classical force-field methods or novel neural network and machine learning potentials. We achieve this breakthrough by combining innovations in linear-scaling AIMD, efficient and approximate sparse linear algebra, low and mixed-precision floating-point computation on GPUs, and a compensation scheme for the errors introduced by numerical approximations. The core of our work is the non-orthogonalized local submatrix (NOLSM) method, which scales very favorably to massively parallel computing systems and translates large sparse matrix operations into highly parallel, dense matrix operations that are ideally suited to hardware accelerators. We demonstrate that the NOLSM method, which is at the center point of each AIMD step, is able to achieve a sustained performance of 324 PFLOP/s in mixed FP16/FP32 precision corresponding to an efficiency of 67.7\% when running on 1536 NVIDIA A100 GPUs.
翻译:我们把基于电子结构的分子动态(AIMD)的界限推向1亿个原子之外。否则,这种规模几乎无法通过传统的力场方法或新型神经网络和机器学习潜力来实现。我们通过将线性扩缩的AIMD、高效和近乎稀少的线性代数、对GPU的低和混合精度浮点计算以及数字近似引入的误差补偿方案等创新结合起来,实现了这一突破。我们工作的核心是非高度化的地方子矩阵(NOLSM)方法,该方法非常适合大规模平行的计算系统,并将大量稀散的矩阵操作转化为非常适合硬件加速器的高度平行、密集的矩阵操作。我们证明,NOLSM方法(位于AIMD每个步骤的中心点)能够持续地在混合的FP16/FP32精确度中达到324 PFLOP/s。在运行1536 NVIDIA A100 GPUPU时,其效率相当于67.7%。