The ability to discriminate between generative graph models is critical to understanding complex structural patterns in both synthetic graphs and the real-world structures that they emulate. While Graph Neural Networks (GNNs) have seen increasing use to great effect in graph classification tasks, few studies explore their integration with interpretable graph theoretic features. This paper investigates the classification of synthetic graph families using a hybrid approach that combines GNNs with engineered graph-theoretic features. We generate a large and structurally diverse synthetic dataset comprising graphs from five representative generative families, Erdos-Renyi, Watts-Strogatz, Barab'asi-Albert, Holme-Kim, and Stochastic Block Model. These graphs range in size up to 1x10^4 nodes, containing up to 1.1x10^5 edges. A comprehensive range of node and graph level features is extracted for each graph and pruned using a Random Forest based feature selection pipeline. The features are integrated into six GNN architectures: GCN, GAT, GATv2, GIN, GraphSAGE and GTN. Each architecture is optimised for hyperparameter selection using Optuna. Finally, models were compared against a baseline Support Vector Machine (SVM) trained solely on the handcrafted features. Our evaluation demonstrates that GraphSAGE and GTN achieve the highest classification performance, with 98.5% accuracy, and strong class separation evidenced by t-SNE and UMAP visualisations. GCN and GIN also performed well, while GAT-based models lagged due to limitations in their ability to capture global structures. The SVM baseline confirmed the importance of the message passing functionality for performance gains and meaningful class separation.
翻译:区分不同图生成模型的能力对于理解合成图及其模拟的真实世界结构中复杂的结构模式至关重要。尽管图神经网络在图分类任务中的应用日益广泛且成效显著,但鲜有研究探讨其与可解释图论特征的结合。本文研究了一种将图神经网络与人工设计的图论特征相结合的混合方法,用于对合成图族进行分类。我们生成了一个规模庞大且结构多样的合成数据集,包含来自五个代表性生成族系的图:Erdos-Renyi、Watts-Strogatz、Barab'asi-Albert、Holme-Kim 和随机分块模型。这些图的规模可达 1x10^4 个节点,包含多达 1.1x10^5 条边。为每个图提取了全面的节点级和图级特征,并使用基于随机森林的特征选择流程进行筛选。这些特征被集成到六种图神经网络架构中:GCN、GAT、GATv2、GIN、GraphSAGE 和 GTN。每种架构均使用 Optuna 进行超参数选择优化。最后,将这些模型与仅基于手工特征训练的基线支持向量机进行比较。我们的评估表明,GraphSAGE 和 GTN 取得了最高的分类性能,准确率达到 98.5%,t-SNE 和 UMAP 可视化结果也显示出强烈的类别分离。GCN 和 GIN 同样表现良好,而基于 GAT 的模型由于捕捉全局结构的能力有限而表现滞后。支持向量机基线结果证实了消息传递功能对于性能提升和实现有意义的类别分离的重要性。