计算机中天然的晚托：本征函数理论中的基组全极限外推：一个基于分位数随机森林的模型 (Extrapolation to complete basis-set limit in density-functional theory by quantile random-forest models)

The numerical precision of density-functional-theory (DFT) calculations depends on a variety of computational parameters, one of the most critical being the basis-set size. The ultimate precision is reached with an infinitely large basis set, i.e., in the limit of a complete basis set (CBS). Our aim in this work is to find a machine-learning model that extrapolates finite basis-size calculations to the CBS limit. We start with a data set of 63 binary solids investigated with two all-electron DFT codes, exciting and FHI-aims, which employ very different types of basis sets. A quantile-random-forest model is used to estimate the total-energy correction with respect to a fully converged calculation as a function of the basis-set size. The random-forest model achieves a symmetric mean absolute percentage error of lower than 25% for both codes and outperforms previous approaches in the literature. Our approach also provides prediction intervals, which quantify the uncertainty of the models' predictions.

翻译：本文旨在找到一种机器学习模型，它能够将有限的基组大小计算外推到基组全极限的结果。在本研究中，我们使用两种全电子密度泛函理论 (DFT) 代码（Exciting 和 FHI-AIMS）以及两种不同类型的基组对 63 种二元晶体进行了研究。我们使用了一种分位数随机森林模型，以估计关于基组尺寸的总能量修正，以便与完全收敛计算进行比较。分位数随机森林模型在两种代码上的对称平均绝对百分比误差都低于25%，并且优于文献中的先前方法。本文的方法还提供了预测区间，可以量化模型预测的不确定性。