多分子图表示增强化合物属性和活性预测模型学习和解释能力 (Enhancing Model Learning and Interpretation Using Multiple Molecular Graph Representations for Compound Property and Activity Prediction)

Graph neural networks (GNNs) demonstrate great performance in compound property and activity prediction due to their capability to efficiently learn complex molecular graph structures. However, two main limitations persist including compound representation and model interpretability. While atom-level molecular graph representations are commonly used because of their ability to capture natural topology, they may not fully express important substructures or functional groups which significantly influence molecular properties. Consequently, recent research proposes alternative representations employing reduction techniques to integrate higher-level information and leverages both representations for model learning. However, there is still a lack of study about different molecular graph representations on model learning and interpretation. Interpretability is also crucial for drug discovery as it can offer chemical insights and inspiration for optimization. Numerous studies attempt to include model interpretation to explain the rationale behind predictions, but most of them focus solely on individual prediction with little analysis of the interpretation on different molecular graph representations. This research introduces multiple molecular graph representations that incorporate higher-level information and investigates their effects on model learning and interpretation from diverse perspectives. The results indicate that combining atom graph representation with reduced molecular graph representation can yield promising model performance. Furthermore, the interpretation results can provide significant features and potential substructures consistently aligning with background knowledge. These multiple molecular graph representations and interpretation analysis can bolster model comprehension and facilitate relevant applications in drug discovery.

翻译：图形神经网络（GNN）由于其能够有效地学习复杂的分子图结构，在化合物属性和活性预测方面展现出极高的性能。然而，目前存在两个主要的限制，即化合物表示和模型可解释性。虽然原子级的分子图表示通常被使用，因为它们能够捕捉自然拓扑结构，但它们可能无法完全表达重要的亚结构或功能团，从而显著影响分子性质。因此，最近的研究提出了利用降维技术来整合更高级别的信息的替代性表示，并利用这两种表示进行模型学习。然而，对于不同的分子图表示对模型学习和解释的影响仍缺乏研究。对于药物发现来说，可解释性也至关重要，因为它可以提供化学见解和优化启示。许多研究试图包括模型解释来解释预测背后的原理，但大部分研究仅关注个别预测，很少分析对不同分子图表示的解释。本研究引入了多种包含更高级别信息的分子图表示，并从不同角度研究它们对模型学习和解释的影响。结果表明，原子图表示与减少的分子图表示结合可以得到有希望的模型性能。此外，解释结果可以提供与背景知识一致的重要特征和潜在亚结构。这些多种分子图表示和解释分析可以增强模型理解，并促进药物发现等相关应用。