基于数据增强的图形OOS泛化中的标签位移问题：关注标签位移 (Mind the Label Shift of Augmentation-based Graph OOD Generalization)

Out-of-distribution (OOD) generalization is an important issue for Graph Neural Networks (GNNs). Recent works employ different graph editions to generate augmented environments and learn an invariant GNN for generalization. However, the label shift usually occurs in augmentation since graph structural edition inevitably alters the graph label. This brings inconsistent predictive relationships among augmented environments, which is harmful to generalization. To address this issue, we propose \textbf{LiSA}, which generates label-invariant augmentations to facilitate graph OOD generalization. Instead of resorting to graph editions, LiSA exploits \textbf{L}abel-\textbf{i}nvariant \textbf{S}ubgraphs of the training graphs to construct \textbf{A}ugmented environments. Specifically, LiSA first designs the variational subgraph generators to extract locally predictive patterns and construct multiple label-invariant subgraphs efficiently. Then, the subgraphs produced by different generators are collected to build different augmented environments. To promote diversity among augmented environments, LiSA further introduces a tractable energy-based regularization to enlarge pair-wise distances between the distributions of environments. In this manner, LiSA generates diverse augmented environments with a consistent predictive relationship and facilitates learning an invariant GNN. Extensive experiments on node-level and graph-level OOD benchmarks show that LiSA achieves impressive generalization performance with different GNN backbones. Code is available on \url{https://github.com/Samyu0304/LiSA}.

翻译：摘要：OOS泛化对于图神经网络（GNN）而言是重要的问题。最近的一些研究利用不同的图引用来生成数据增强的环境，并学习不变的GNN以实现泛化。然而，由于图形结构的编辑不可避免地会改变图标签，因此标签位移通常会在增强期间发生。这会导致增强环境之间的不一致的预测关系，这对泛化是有害的。为了解决这个问题，我们提出了\textbf{LiSA}（Mind the Label Shift of Augmentation-based Graph OOD Generalization）的方式。该方法生成标签不变的增强以促进图形OOS泛化。LiSA不是利用图引用，而是利用训练图的标签不变子图来构建增强环境。具体而言，LiSA首先设计了变分子图生成器来提取本地预测模式并高效地构建多个标签不变子图。然后，不同生成器产生的子图被收集起来构建不同的增强环境。为了促进增强环境的多样性，LiSA进一步引入了可行的基于能量的正则化方法，以增加环境分布之间的配对距离。这样，LiSA就可以生成具有一致预测关系的各种增强环境，从而促进学习不变的GNN。在节点级和图级OOD基准测试中进行了大量实验，结果表明LiSA在不同的GNN骨干结构下具有出色的泛化性能。代码在\url{https://github.com/Samyu0304/LiSA}上可用。