提升图神经网络的环计数能力：使用I$^2$-GNNs (Boosting the Cycle Counting Power of Graph Neural Networks with I$^2$-GNNs)

Message Passing Neural Networks (MPNNs) are a widely used class of Graph Neural Networks (GNNs). The limited representational power of MPNNs inspires the study of provably powerful GNN architectures. However, knowing one model is more powerful than another gives little insight about what functions they can or cannot express. It is still unclear whether these models are able to approximate specific functions such as counting certain graph substructures, which is essential for applications in biology, chemistry and social network analysis. Motivated by this, we propose to study the counting power of Subgraph MPNNs, a recent and popular class of powerful GNN models that extract rooted subgraphs for each node, assign the root node a unique identifier and encode the root node's representation within its rooted subgraph. Specifically, we prove that Subgraph MPNNs fail to count more-than-4-cycles at node level, implying that node representations cannot correctly encode the surrounding substructures like ring systems with more than four atoms. To overcome this limitation, we propose I$^2$-GNNs to extend Subgraph MPNNs by assigning different identifiers for the root node and its neighbors in each subgraph. I$^2$-GNNs' discriminative power is shown to be strictly stronger than Subgraph MPNNs and partially stronger than the 3-WL test. More importantly, I$^2$-GNNs are proven capable of counting all 3, 4, 5 and 6-cycles, covering common substructures like benzene rings in organic chemistry, while still keeping linear complexity. To the best of our knowledge, it is the first linear-time GNN model that can count 6-cycles with theoretical guarantees. We validate its counting power in cycle counting tasks and demonstrate its competitive performance in molecular prediction benchmarks.

翻译：----- 信息传递神经网络（MPNNs）是广泛使用的图神经网络（GNNs）的一种。由于MPNNs的有限表示能力，激发了对可证明强大GNN体系结构的研究。然而，了解一个模型比另一个模型更强大仅对其能或不能表达哪些函数提供很少的见解，仍不清楚这些模型是否能够近似特定的函数，例如计数特定的图子结构，这对于生物学、化学和社交网络分析等应用至关重要。出于这个动机，我们提出了研究Subgraph MPNNs的计数能力，这是最近和流行的一类强大的GNN模型，可以为每个节点提取根子图、为根节点分配一个唯一的标识符，并将根节点的表示编码在其根子图中。具体来说，我们证明Subgraph MPNNs在节点级别上无法计数超过四个环的子结构，这意味着节点表示无法正确编码周围的包含四个以上原子的环系统等子结构。为了克服这种限制，我们提出了I$^2$-GNNs，通过在每个子图中为根节点和其邻居分配不同的标识符来扩展Subgraph MPNNs。I$^2$-GNNs的判别能力被证明严格强于Subgraph MPNNs，但部分比3-WL测试更强。更重要的是，I$^2$-GNNs被证明能够计数所有的3、4、5和6个环，覆盖有机化学中的苯环等常见子结构，同时仍保持线性复杂度。据我们所知，这是第一个能够在理论上保证的线性时间GNN模型，可以计算6-环。我们在循环计数任务中验证了其计数能力，并在分子预测基准测试中展示了其竞争性能。