Data-driven methods based on machine learning have the potential to accelerate analysis of atomic structures. However, machine learning models can produce overconfident predictions and it is therefore crucial to detect and handle uncertainty carefully. Here, we extend a message passing neural network designed specifically for predicting properties of molecules and materials with a calibrated probabilistic predictive distribution. The method presented in this paper differs from the previous work by considering both aleatoric and epistemic uncertainty in a unified framework, and by re-calibrating the predictive distribution on unseen data. Through computer experiments, we show that our approach results in accurate models for predicting molecular formation energies with calibrated uncertainty in and out of the training data distribution on two public molecular benchmark datasets, QM9 and PC9. The proposed method provides a general framework for training and evaluating neural network ensemble models that are able to produce accurate predictions of properties of molecules with calibrated uncertainty.
翻译:基于机器学习的数据驱动方法有可能加速分析原子结构。 但是, 机器学习模型可以产生过度自信的预测, 因此, 谨慎地检测和处理不确定性至关重要 。 在这里, 我们推广一个信息传递神经网络, 专门用来预测分子和材料的特性, 并配以校准概率预测分布 。 本文中介绍的方法与先前的工作不同, 方法是在统一的框架内同时考虑感知不确定性和感知不确定性, 以及重新校准对不可见数据的预测分布 。 通过计算机实验, 我们展示了我们的方法结果, 用于预测分子形成能量的精确模型, 在两个公共分子基准数据集( QM9 和 PC9) 的培训数据分布中, 校准不确定性 。 拟议的方法为培训和评价神经网络共性模型提供了一个总体框架, 从而能够准确预测有校准不确定性的分子的特性 。