Learning expressive molecular representations is crucial to facilitate the accurate prediction of molecular properties. Despite the significant advancement of graph neural networks (GNNs) in molecular representation learning, they generally face limitations such as neighbors-explosion, under-reaching, over-smoothing, and over-squashing. Also, GNNs usually have high computational costs because of the large-scale number of parameters. Typically, such limitations emerge or increase when facing relatively large-size graphs or using a deeper GNN model architecture. An idea to overcome these problems is to simplify a molecular graph into a small, rich, and informative one, which is more efficient and less challenging to train GNNs. To this end, we propose a novel molecular graph coarsening framework named FunQG utilizing Functional groups, as influential building blocks of a molecule to determine its properties, based on a graph-theoretic concept called Quotient Graph. By experiments, we show that the resulting informative graphs are much smaller than the molecular graphs and thus are good candidates for training GNNs. We apply the FunQG on popular molecular property prediction benchmarks and then compare the performance of some popular baseline GNNs on the obtained datasets with the performance of several state-of-the-art baselines on the original datasets. By experiments, this method significantly outperforms previous baselines on various datasets, besides its dramatic reduction in the number of parameters and low computational costs. Therefore, the FunQG can be used as a simple, cost-effective, and robust method for solving the molecular representation learning problem.
翻译:分子表达方式是有助于准确预测分子特性的关键。尽管在分子代表制学习方面石墨神经网络(GNNS)取得了显著进步,但它们一般都面临诸如近邻爆炸、低影响、超移动和超振等限制。此外,由于参数数量庞大,GNNS通常具有很高的计算成本。一般情况下,当面对相对大规模的图形或使用更深的GNN模型结构时,这种限制会出现或增加。克服这些问题的一个想法是将分子图简化成一个小、丰富、信息丰富、对培训GNNS来说挑战较少的参数。为此,我们提议建立一个名为FunQG的新型分子图形分析框架,作为分子特性的具有影响力的构件,根据一个叫作“引言图”的概念来确定其特性。我们通过实验发现,由此得出的低信息图表比分子图要小得多,因此是培训一些GNNNP的精密、丰富和丰富的参数,我们用FinQG的精度参数分析模型,而不是GNP的精度基准。我们把G的原始计算方法的精度基准数据比G的原始数据。