Graph neural networks are emerging as promising methods for modeling molecular graphs, in which nodes and edges correspond to atoms and chemical bonds, respectively. Recent studies show that when 3D molecular geometries, such as bond lengths and angles, are available, molecular property prediction tasks can be made more accurate. However, computing of 3D molecular geometries requires quantum calculations that are computationally prohibitive. For example, accurate calculation of 3D geometries of a small molecule requires hours of computing time using density functional theory (DFT). Here, we propose to predict the ground-state 3D geometries from molecular graphs using machine learning methods. To make this feasible, we develop a benchmark, known as Molecule3D, that includes a dataset with precise ground-state geometries of approximately 4 million molecules derived from DFT. We also provide a set of software tools for data processing, splitting, training, and evaluation, etc. Specifically, we propose to assess the error and validity of predicted geometries using four metrics. We implement two baseline methods that either predict the pairwise distance between atoms or atom coordinates in 3D space. Experimental results show that, compared with generating 3D geometries with RDKit, our method can achieve comparable prediction accuracy but with much smaller computational costs. Our Molecule3D is available as a module of the MoleculeX software library (https://github.com/divelab/MoleculeX).
翻译:近些研究显示,当存在3D分子分子比例尺(如债券长度和角度)时,分子属性预测任务可以更加准确。然而,计算3D分子分子比例尺需要计算量数,而计算3D分子比例尺时,计算量子计算方法是无法计算。例如,精确计算3D小分子的3D地形需要用密度功能理论(DFT)来计算时间。在这里,我们提议用机器学习方法从分子比例尺预测地面-状态 3D 地貌。为了使这一基准(称为Molecule3D)变得可行,我们制定了一个基准,其中包括一组精确的地面位置尺数(来自DFT的大约400万个分子)的数据集。我们还提供一套软件工具,用于数据处理、分解、培训和评估等。我们提议用四度测量仪来评估预测的地理比例尺的误差和有效性。我们采用两种基线方法,要么用机器学习的方法预测分子比例尺3D的距离,要么用Moleclex3,要么用可比较的模型来预测,要么用我们的实验模型来计算,要么用我们的实验模型来计算。