Learning expressive molecular representations is crucial to facilitate the accurate prediction of molecular properties. Despite the significant advancement of graph neural networks (GNNs) in molecular representation learning, they generally face limitations such as neighbors-explosion, under-reaching, over-smoothing, and over-squashing. Also, GNNs usually have high computational complexity because of the large-scale number of parameters. Typically, such limitations emerge or increase when facing relatively large-size graphs or using a deeper GNN model architecture. An idea to overcome these problems is to simplify a molecular graph into a small, rich, and informative one, which is more efficient and less challenging to train GNNs. To this end, we propose a novel molecular graph coarsening framework named FunQG utilizing Functional groups, as influential building blocks of a molecule to determine its properties, based on a graph-theoretic concept called Quotient Graph. By experiments, we show that the resulting informative graphs are much smaller than the molecular graphs and thus are good candidates for training GNNs. We apply the FunQG on popular molecular property prediction benchmarks and then compare the performance of a GNN architecture on the obtained datasets with several state-of-the-art baselines on the original datasets. By experiments, this method significantly outperforms previous baselines on various datasets, besides its dramatic reduction in the number of parameters and low computational complexity. Therefore, the FunQG can be used as a simple, cost-effective, and robust method for solving the molecular representation learning problem.
翻译:分子表达方式是有助于准确预测分子特性的关键。尽管在分子代表制学习方面石墨神经网络(GNNS)取得了显著进步,但它们一般都面临诸如邻居爆炸、影响不足、过度透透和过度隔热等限制。此外,由于参数数量庞大,GNNS通常具有很高的计算复杂性。通常,当面对相对大规模的图形或使用更深层次的GNN模型结构时,这种限制会出现或增加。克服这些问题的一个想法是将分子图简化成一个小型、丰富和丰富的信息型神经网络,这对于培训GNNS来说效率更高,挑战性更小。为此,我们提议建立一个名为FunQG的新型分子图形分析框架,利用功能组作为确定其特性的具有影响力的分子构件,基于一个叫作“引言图”的概念。我们通过实验发现,由此产生的信息性图表比分子图要小得多,因此成为培训GNNPS的良好人选。我们用FinQG的简单度代表方式,用这个基础模型来大幅降低GNS的原始数据。我们用了一些原始数据模型来比较。