ChemRL-GEM:关于财产预测的几何增强分子代表性学习 (ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction)

Effective molecular representation learning is of great importance to facilitate molecular property prediction, which is a fundamental task for the drug and material industry. Recent advances in graph neural networks (GNNs) have shown great promise in applying GNNs for molecular representation learning. Moreover, a few recent studies have also demonstrated successful applications of self-supervised learning methods to pre-train the GNNs to overcome the problem of insufficient labeled molecules. However, existing GNNs and pre-training strategies usually treat molecules as topological graph data without fully utilizing the molecular geometry information. Whereas, the three-dimensional (3D) spatial structure of a molecule, a.k.a molecular geometry, is one of the most critical factors for determining molecular physical, chemical, and biological properties. To this end, we propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL). At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule. To be specific, we devised double graphs for a molecule: The first one encodes the atom-bond relations; The second one encodes bond-angle relations. Moreover, on top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge by utilizing the local and global molecular 3D structures. We compare ChemRL-GEM with various state-of-the-art (SOTA) baselines on different molecular benchmarks and exhibit that ChemRL-GEM can significantly outperform all baselines in both regression and classification tasks. For example, the experimental results show an overall improvement of 8.8% on average compared to SOTA baselines on the regression tasks, demonstrating the superiority of the proposed method.

翻译：有效的分子代表制学习对于促进分子财产预测非常重要,而分子财产预测是药物和材料工业的一项基本任务。石墨神经网络(GNNS)最近的进展在应用GNNs进行分子代表制学习方面显示了很大的希望。此外,最近的一些研究还展示了成功应用自我监督的学习方法,对GNNS进行预先培训,以克服标签不全的分子问题。然而,现有的GNNS和训练前战略通常将分子作为表层图形数据处理,而没有充分利用分子基线测量信息。而分子(a.k.a分子)的三维(3D)空间结构在应用GNNNPs进行分子分子代表制学习方面表现出了很大的前景。我们设计了一个基于几何的GNNNS结构,同时模拟了原子、债券和债券的改善基准。我们为分子(a.k.a.a.a 分子(a.k.a.分子)的三维空间基线空间空间结构空间结构的空间结构空间空间结构提供了一种双重的图象,在GNMMSeal-deal结构中展示了多种高级结构的一码,在Seal-deal-deal-de-de-deal-de mal-deal-de-de-de-de-de-dealmaxalmaxalmamaxal 上展示了一种我们内部关系,在一种系统上展示了某种方法。

相关内容

表示学习

关注 186

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

AAAI2021 | 图神经网络的异质图结构学习，Heterogeneous Graph Structure Learning for Graph Neural Networks

专知会员服务

92+阅读 · 2021年1月20日