Using attention weights to identify information that is important for models' decision-making is a popular approach to interpret attention-based neural networks. This is commonly realized in practice through the generation of a heat-map for every single document based on attention weights. However, this interpretation method is fragile, and easy to find contradictory examples. In this paper, we propose a corpus-level explanation approach, which aims to capture causal relationships between keywords and model predictions via learning the importance of keywords for predicted labels across a training corpus based on attention weights. Based on this idea, we further propose a concept-based explanation method that can automatically learn higher-level concepts and their importance to model prediction tasks. Our concept-based explanation method is built upon a novel Abstraction-Aggregation Network, which can automatically cluster important keywords during an end-to-end training process. We apply these methods to the document classification task and show that they are powerful in extracting semantically meaningful keywords and concepts. Our consistency analysis results based on an attention-based Na\"ive Bayes classifier also demonstrate these keywords and concepts are important for model predictions.
翻译:使用关注权重来确定对模型决策很重要的信息,是解释以关注为基础的神经网络的流行方法。这在实践中通常通过生成基于关注权重的每一份文件的热图来实现。然而,这种解释方法很脆弱,容易找到相互矛盾的例子。在本文件中,我们提出一个实体级解释方法,目的是通过学习关键词对根据关注权重在培训中预测标签的关键词的重要性来捕捉关键词和模型预测之间的因果关系。基于这一想法,我们进一步提出一个基于概念的解释方法,可以自动学习更高层次的概念及其对模型预测任务的重要性。我们基于概念的解释方法建在一个新型的“摘要集成网络”,可以在终端到终端培训过程中自动组合重要的关键词。我们将这些方法应用于文件分类任务,并表明它们在提取基于关注的具有意义的关键词和概念方面是强大的。我们基于关注的“Veayees”分类器的一致分析结果也证明了这些关键词和概念对于模型预测很重要。