Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science. In particular, it encodes molecules as numerical vectors preserving the molecular structures and features, on top of which the downstream tasks (e.g., property prediction) can be performed. Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning. In this survey, we systematically review these graph-based molecular representation techniques, especially the methods incorporating chemical domain knowledge. Specifically, we first introduce the features of 2D and 3D molecular graphs. Then we summarize and categorize MRL methods into three groups based on their input. Furthermore, we discuss some typical chemical applications supported by MRL. To facilitate studies in this fast-developing area, we also list the benchmarks and commonly used datasets in the paper. Finally, we share our thoughts on future research directions.
翻译:分子表示学习(MRL)是建立机器学习和化学科学之间联系的关键步骤。特别地,它将分子编码为数字向量,保留分子的结构和特征,之后这些任务(例如性质预测)就可以在此基础上完成。最近,MRL 取得了相当的进展,特别是在基于深度分子图学习的方法中。在这篇综述中,我们系统地回顾了这些基于图的分子表示技术,特别是包含化学领域知识的方法。具体来说,我们首先介绍了二维和三维分子图的特征。然后我们根据输入将 MRL 方法总结并分类为三组。此外,我们还讨论了一些由 MRL 支持的典型化学应用。为了促进这个快速发展领域的研究,我们还在文章中列出了常用的基准和数据集。最后,我们分享了我们对未来研究方向的想法。