We present a high-performance evaluation method for 4-center 2-particle integrals over Gaussian atomic orbitals with high angular momenta ($l\geq4$) and arbitrary contraction degrees on graphical processing units (GPUs) and other accelerators. The implementation uses the matrix form of McMurchie-Davidson recurrences. Evaluation of the 4-center integrals over four $l=6$ ($i$) Gaussian AOs in the double precision (FP64) on an NVIDIA V100 GPU outperforms the reference implementation of the Obara-Saika recurrences (${\tt Libint}$) running on a single Intel Xeon core by more than a factor of 1000, healthily exceeding the 73:1 ratio of the respective hardware peak FLOP rates while reaching almost 50\% of the V100 peak. The approach can be extended to support AOs with even higher angular momenta; for low angular momenta alternative approaches will be needed to achieve optimal performance. The implementation is part of an open-source ${\tt LibintX}$ library feely available at ${\tt github.com:ValeevGroup/LibintX}$.
翻译:暂无翻译