Electronic structure calculations based on density-functional theory (DFT) represent a significant part of today's HPC workloads and pose high demands on high-performance computing resources. To perform these quantum-mechanical DFT calculations on complex large-scale systems, so-called linear scaling methods instead of conventional cubic scaling methods are required. In this work, we take up the idea of the submatrix method and apply it to the DFT computations in the software package CP2K. For that purpose, we transform the underlying numeric operations on distributed, large, sparse matrices into computations on local, much smaller and nearly dense matrices. This allows us to exploit the full floating-point performance of modern CPUs and to make use of dedicated accelerator hardware, where performance has been limited by memory bandwidth before. We demonstrate both functionality and performance of our implementation and show how it can be accelerated with GPUs and FPGAs.
翻译:基于密度功能理论(DFT)的电子结构计算是当今HPC工作量的重要部分,对高性能计算资源提出了很高的要求。为了对复杂的大型系统进行量子机械式DFT计算,需要采用所谓的线性缩放方法,而不是传统的立方缩放方法。在这项工作中,我们采用子矩阵方法的想法,并将其应用于软件包CP2K中的DFT计算。为此,我们把分布式、大型、稀少的基数操作转换成对本地、大得多和几乎稠密的基体的计算。这使我们能够利用现代CPU的全面浮点性能,并利用专用加速器硬件,因为其性能以前受到记忆带宽的限制。我们展示了我们执行的功能和性能,并展示了如何用GPUs和FGAs加速它的速度。