Graph neural networks (GNNs) have become a popular approach to integrating structural inductive biases into NLP models. However, there has been little work on interpreting them, and specifically on understanding which parts of the graphs (e.g. syntactic trees or co-reference structures) contribute to a prediction. In this work, we introduce a post-hoc method for interpreting the predictions of GNNs which identifies unnecessary edges. Given a trained GNN model, we learn a simple classifier that, for every edge in every layer, predicts if that edge can be dropped. We demonstrate that such a classifier can be trained in a fully differentiable fashion, employing stochastic gates and encouraging sparsity through the expected $L_0$ norm. We use our technique as an attribution method to analyze GNN models for two tasks -- question answering and semantic role labeling -- providing insights into the information flow in these models. We show that we can drop a large proportion of edges without deteriorating the performance of the model, while we can analyse the remaining edges for interpreting model predictions.
翻译:图形神经网络(GNNs)已成为将结构性感应偏差纳入NLP模型的流行方法。 但是,在解释这些图解方面,特别是了解图表的哪些部分(如合成树或共同参照结构)有助于预测。 在这项工作中,我们引入了一种解释GNNs预测的后热方法,该方法可辨别不必要的边缘。在经过培训的GNN模式下,我们学习了一个简单的分类器,该分类器对每一层的边缘进行预测,以预测是否可以抛下边缘。我们证明,这种分类器可以完全以完全不同的方式加以培训,使用随机门,并通过预期的0.0美元标准鼓励宽度。我们用我们的技术分析GNN模式的模型,用于两项任务 -- -- 问题回答和语义作用标签 -- -- 提供对这些模型信息流的洞察力。我们显示,我们可以在不使模型性能恶化的情况下降低大部分边缘,同时我们可以分析模型预测的剩余边缘。