Gaussian process (GP) emulator has been used as a surrogate model for predicting force field and molecular potential, to overcome the computational bottleneck of molecular dynamics simulation. Integrating both atomic force and energy in predictions was found to be more accurate than using energy alone, yet it requires $O((NM)^3)$ computational operations for computing the likelihood function and making predictions, where $N$ is the number of atoms and $M$ is the number of simulated configurations in the training sample, due to the inversion of a large covariance matrix. The large computational need limits its applications to emulating simulation of small molecules. The computational challenge of using both gradient information and function values in GPs was recently noticed in statistics and machine learning communities, where conventional approximation methods, such as the low rank decomposition or sparse approximation, may not work well. Here we introduce a new approach, the atomized force field (AFF) model, that integrates both force and energy in the emulator with many fewer computational operations. The drastic reduction on computation is achieved by utilizing the naturally sparse structure of the covariance satisfying the constraints of the energy conservation and permutation symmetry of atoms. The efficient machine learning algorithm extends the limits of its applications on larger molecules under the same computational budget, with nearly no loss of predictive accuracy. Furthermore, our approach contains uncertainty assessment of predictions of atomic forces and potentials, useful for developing a sequential design over the chemical input space, with almost no increase in computational cost.
翻译:Gausian 进程模拟器(GP) 已被用作一种替代模型,用于预测力场和分子潜力,克服分子动态模拟的计算瓶颈。在预测中结合原子力和能量的计算方法被认为比单用能源更为准确,但是在计算概率函数和作出预测方面却需要美元(((NM)3)3,3美元计算作业,其中原子数和M美元是原子数,培训样本中的模拟配置数是模拟配置数,原因是转换了一个庞大的共变矩阵。大量计算需要将其应用限制在几乎模拟小分子的模拟中。在统计和机器学习界中发现,使用加速度信息和函数值的计算方法比单独使用能源更准确,但计算方法(如低级分解或微近近)可能不起作用。在这里,我们引入了一种新的方法,即最小化力场(AFF) 模式,将精度和能量都结合了模拟的精度评估,而计算作业则更少。大量计算需要将其精确性应用限制到几乎模拟小分子分子分子的模拟。在统计和机算中使用梯度的精度数据计算方法的精度的计算方法,其精度计算方法的精度的精度的精度的精度的精度的计算方法,在不断计算方法的计算方法的精确度计算中,通过在不断的精确的计算方法的计算中实现。