Neural networks excel at processing unstructured data but often fail to generalise out-of-distribution, whereas classical algorithms guarantee correctness but lack flexibility. We explore whether pretraining Graph Neural Networks (GNNs) on classical algorithms can improve their performance on molecular property prediction tasks from the Open Graph Benchmark: ogbg-molhiv (HIV inhibition) and ogbg-molclintox (clinical toxicity). GNNs trained on 24 classical algorithms from the CLRS Algorithmic Reasoning Benchmark are used to initialise and freeze selected layers of a second GNN for molecular prediction. Compared to a randomly initialised baseline, the pretrained models achieve consistent wins or ties, with the Segments Intersect algorithm pretraining yielding a 6% absolute gain on ogbg-molhiv and Dijkstra pretraining achieving a 3% gain on ogbg-molclintox. These results demonstrate embedding classical algorithmic priors into GNNs provides useful inductive biases, boosting performance on complex, real-world graph data.
翻译:神经网络在处理非结构化数据方面表现出色,但往往难以泛化至分布外数据;而经典算法虽能保证正确性,却缺乏灵活性。本研究探讨了在经典算法上对图神经网络进行预训练,能否提升其在Open Graph Benchmark分子性质预测任务(ogbg-molhiv的HIV抑制预测与ogbg-molclintox的临床毒性预测)上的性能。我们使用CLRS算法推理基准中的24种经典算法对GNN进行预训练,并将其特定层作为初始化参数并冻结,用于构建面向分子预测的次级GNN。相较于随机初始化的基线模型,预训练模型在所有任务中均取得一致优势或持平结果,其中基于线段相交算法的预训练使ogbg-molhiv任务获得6%的绝对性能提升,基于Dijkstra算法的预训练使ogbg-molclintox任务提升3%。这些结果表明,将经典算法先验嵌入GNN能够提供有效的归纳偏置,从而提升模型在复杂现实图数据上的性能。