To improve the efficiency of Gaussian integral evaluation on modern accelerated architectures FLOP-efficient Obara-Saika-based recursive evaluation schemes are optimized for the memory footprint. For the 3-center 2-particle integrals that are key for the evaluation of Coulomb and other 2-particle interactions in the density-fitting approximation the use of multi-quantal recurrences (in which multiple quanta are created or transferred at once) is shown to produce significant memory savings. Other innovation include leveraging register memory for reduced memory footprint and direct compile-time generation of optimized kernels (instead of custom code generation) with compile-time features of modern C++/CUDA. Performance of conventional and CUDA-based implementations of the proposed schemes is illustrated for both the individual batches of integrals involving up to Gaussians with low and high angular momenta (up to $L=6$) and contraction degrees, as well as for the density-fitting-based evaluation of the Coulomb potential. The computer implementation is available in the open-source LibintX library.
翻译:为提高高斯人对现代加速结构的综合评价的效率,为记忆足迹优化了FLOP-Poppe-Obara-Saika-Saika的循环性评价计划。对于三中点二粒子集成,对于评价Coulomb和密度匹配近似中的其他两粒子相互作用至关重要。 多赤道复发(即同时创建或传输多个夸特)的使用显示可以节省大量记忆。其他创新包括利用登记册记忆记忆来减少记忆足迹,以及直接以现代C++/CUDA的编集时间特点来生成优化的内核(而不是自订代码生成)。在传统和CUDA的基础上实施拟议计划的业绩,在开放源图书馆中可以提供计算机应用情况。