Data-driven methods based on machine learning have the potential to accelerate computational analysis of atomic structures. In this context, reliable uncertainty estimates are important for assessing confidence in predictions and enabling decision making. However, machine learning models can produce badly calibrated uncertainty estimates and it is therefore crucial to detect and handle uncertainty carefully. In this work we extend a message passing neural network designed specifically for predicting properties of molecules and materials with a calibrated probabilistic predictive distribution. The method presented in this paper differs from previous work by considering both aleatoric and epistemic uncertainty in a unified framework, and by recalibrating the predictive distribution on unseen data. Through computer experiments, we show that our approach results in accurate models for predicting molecular formation energies with well calibrated uncertainty in and out of the training data distribution on two public molecular benchmark datasets, QM9 and PC9. The proposed method provides a general framework for training and evaluating neural network ensemble models that are able to produce accurate predictions of properties of molecules with well calibrated uncertainty estimates.
翻译:以机器学习为基础的数据驱动方法有可能加速原子结构的计算分析。在这方面,可靠的不确定性估计对于评估预测和有利决策的信心十分重要。然而,机器学习模型可以产生校准不当的不确定性估计值,因此对仔细检测和处理不确定性至关重要。在这项工作中,我们扩展了一个信息传递神经网络,专门用来预测分子和材料的特性,并有校准的概率预测分布。本文件提出的方法与以前的工作不同,方法是在统一的框架内考虑疏松性和感知性不确定性,以及重新校正对未知数据的预测分布。我们通过计算机实验,显示我们的方法是精确预测分子形成能量的模型,在两个公共分子基准数据集(QM9和PC9)的培训数据分布中和校准的不确定性。拟议方法为培训和评价神经网络共性模型提供了一个总体框架,这些模型能够对分子的特性作出准确预测,并作出精确的不确定性估计。