The electron density of a molecule or material has recently received major attention as a target quantity of machine-learning models. A natural choice to construct a model that yields transferable and linear-scaling predictions is to represent the scalar field using a multi-centered atomic basis analogous to that routinely used in density fitting approximations. However, the non-orthogonality of the basis poses challenges for the learning exercise, as it requires accounting for all the atomic density components at once. We devise a gradient-based approach to directly minimize the loss function of the regression problem in an optimized and highly sparse feature space. In so doing, we overcome the limitations associated with adopting an atom-centered model to learn the electron density over arbitrarily complex datasets, obtaining extremely accurate predictions. The enhanced framework is tested on 32-molecule periodic cells of liquid water, presenting enough complexity to require an optimal balance between accuracy and computational efficiency. We show that starting from the predicted density a single Kohn-Sham diagonalization step can be performed to access total energy components that carry an error of just 0.1 meV/atom with respect to the reference density functional calculations. Finally, we test our method on the highly heterogeneous QM9 benchmark dataset, showing that a small fraction of the training data is enough to derive ground-state total energies within chemical accuracy.
翻译:分子或材料的电子密度最近作为机器学习模型的目标数量受到极大关注。 建立模型以得出可转移和线性缩放预测的模型的自然选择是,使用类似于密度匹配近似的近似常用多偏心原子基础,代表标度场。 但是,该基的非二次量性对学习工作提出了挑战,因为它要求同时计算所有原子密度组成部分。 我们设计了一种基于梯度的方法,以直接将最优化和高度稀少的特征空间中回归问题的损失功能降到最低。 通过这样做,我们克服了与采用以原子为中心的模型来学习任意复杂数据集的电子密度有关的局限性,获得了极准确的预测。 强化框架在32个分子周期性液体水细胞上进行了测试,显示出足够的复杂性,需要在精确度和计算效率之间取得最佳平衡。 我们从预测密度开始,一个单一的Kohn-Sham Diagonalization化步骤可以进行总能源组件的获取,而该元素的误差只有0.1M/totom。 我们用原子中心模型来学习电子密度的精确度基准数据。 最后,我们用一个高密度的精确度数据测试。