通过学习分解因果分子结构,降低图图神经网络的偏差 (Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure)

Most Graph Neural Networks (GNNs) predict the labels of unseen graphs by learning the correlation between the input graphs and labels. However, by presenting a graph classification investigation on the training graphs with severe bias, surprisingly, we discover that GNNs always tend to explore the spurious correlations to make decision, even if the causal correlation always exists. This implies that existing GNNs trained on such biased datasets will suffer from poor generalization capability. By analyzing this problem in a causal view, we find that disentangling and decorrelating the causal and bias latent variables from the biased graphs are both crucial for debiasing. Inspiring by this, we propose a general disentangled GNN framework to learn the causal substructure and bias substructure, respectively. Particularly, we design a parameterized edge mask generator to explicitly split the input graph into causal and bias subgraphs. Then two GNN modules supervised by causal/bias-aware loss functions respectively are trained to encode causal and bias subgraphs into their corresponding representations. With the disentangled representations, we synthesize the counterfactual unbiased training samples to further decorrelate causal and bias variables. Moreover, to better benchmark the severe bias problem, we construct three new graph datasets, which have controllable bias degrees and are easier to visualize and explain. Experimental results well demonstrate that our approach achieves superior generalization performance over existing baselines. Furthermore, owing to the learned edge mask, the proposed model has appealing interpretability and transferability. Code and data are available at: https://github.com/googlebaba/DisC.

翻译：多数图形神经网络( GNN) 通过学习输入图表和标签之间的关联性, 预测了隐形图的标签。然而, 通过在培训图表上提供图表分类调查, 呈现出一个有严重偏差的图表分类调查, 令人惊讶地发现, 我们发现, GNNN总是会探索做决定的虚假关联性, 即使因果关系总是存在。这意味着在这种有偏差的数据集上受过培训的现有GNNS将受到缺乏概括性能力的影响。通过从因果角度分析这一问题, 我们发现, 从偏差图表中分解和整理因果和偏差潜在变量对于降低偏差都至关重要。但是, 我们为此提出一个一般不相干 GNNN框架, 来分别学习因果子结构和偏差的子结构。我们设计了一个参数化的边际遮掩码生成器, 将输入图解偏差的图形分割成因果和偏差的GNNNM模块, 分别通过因果模型分析来将因果和偏差的子图解入相应的表达方式。随着分解的表达方式, 我们比较容易分解性地将正反偏差的GNNNNNNC/ 度转换为直判的直判的直判。我们的直判的直判测试的模型的模型的模型到更精确的模型到更精确的模型到更精确的模型到更精确的模型, 。