Sparse attention has been claimed to increase model interpretability under the assumption that it highlights influential inputs. Yet the attention distribution is typically over representations internal to the model rather than the inputs themselves, suggesting this assumption may not have merit. We build on the recent work exploring the interpretability of attention; we design a set of experiments to help us understand how sparsity affects our ability to use attention as an explainability tool. On three text classification tasks, we verify that only a weak relationship between inputs and co-indexed intermediate representations exists -- under sparse attention and otherwise. Further, we do not find any plausible mappings from sparse attention distributions to a sparse set of influential inputs through other avenues. Rather, we observe in this setting that inducing sparsity may make it less plausible that attention can be used as a tool for understanding model behavior.
翻译:在强调有影响力的投入的假设下,人们声称对提高模型可解释性的关注不够,声称这种关注增加了模型可解释性,然而,这种关注的分布通常超过模型内部的表达,而不是投入本身的表达,这表明这一假设可能没有价值。我们以最近探讨关注可解释性的工作为基础;我们设计了一系列实验,以帮助我们理解将关注作为一种解释性工具影响我们使用关注的能力。关于三个文本分类任务,我们核实投入和共同索引的中间表述之间存在的薄弱关系 -- -- 缺乏重视和其他方面。此外,我们没有发现从微小的注意力分布到通过其他途径的少数有影响力的投入的任何貌似有理的图象。相反,我们注意到,在这种背景下,诱导引力的过度性可能使人们不太可信地认为,可将关注用作理解模式行为的工具。