分子动态和取样的不确定性估计 (Uncertainty estimation for molecular dynamics and sampling)

Machine learning models have emerged as a very effective strategy to sidestep time-consuming electronic-structure calculations, enabling accurate simulations of greater size, time scale and complexity. Given the interpolative nature of these models, the reliability of predictions depends on the position in phase space, and it is crucial to obtain an estimate of the error that derives from the finite number of reference structures included during the training of the model. When using a machine-learning potential to sample a finite-temperature ensemble, the uncertainty on individual configurations translates into an error on thermodynamic averages, and provides an indication for the loss of accuracy when the simulation enters a previously unexplored region. Here we discuss how uncertainty quantification can be used, together with a baseline energy model, or a more robust although less accurate interatomic potential, to obtain more resilient simulations and to support active-learning strategies. Furthermore, we introduce an on-the-fly reweighing scheme that makes it possible to estimate the uncertainty in the thermodynamic averages extracted from long trajectories. We present examples covering different types of structural and thermodynamic properties, and systems as diverse as water and liquid gallium.

翻译：机器学习模型已经成为一种非常有效的战略,可以绕过耗时的电子结构计算,从而能够进行更精确的体积、时间尺度和复杂程度的模拟。鉴于这些模型的内推性质,预测的可靠性取决于阶段空间的位置,因此关键是要从模型培训期间包含的参考结构有限数量中获得对错误的估计。当利用机器学习潜力对有限温度组合进行取样时,个别配置的不确定性会转化为热力平均值上的错误,并表明当模拟进入以前未勘探的区域时准确性会丧失。我们在这里讨论如何使用不确定性的量化,同时使用基线能源模型,或更强但不太准确的内核潜力,以获得更具弹性的模拟,并支持积极的学习战略。此外,我们引入了一种在空中学习的模型,以便能够估计从长期轨迹中提取的热力平均值的不确定性。我们举例说明了不同类型的结构和热力动力特性,以及作为水和液体的不同系统。