The debate around the interpretability of attention mechanisms is centered on whether attention scores can be used as a proxy for the relative amounts of signal carried by sub-components of data. We propose to study the interpretability of attention in the context of set machine learning, where each data point is composed of an unordered collection of instances with a global label. For classical multiple-instance-learning problems and simple extensions, there is a well-defined "importance" ground truth that can be leveraged to cast interpretation as a binary classification problem, which we can quantitatively evaluate. By building synthetic datasets over several data modalities, we perform a systematic assessment of attention-based interpretations. We find that attention distributions are indeed often reflective of the relative importance of individual instances, but that silent failures happen where a model will have high classification performance but attention patterns that do not align with expectations. Based on these observations, we propose to use ensembling to minimize the risk of misleading attention-based explanations.
翻译:围绕关注机制可解释性的辩论集中在关注分数是否可以用作数据子组成部分所传送信号相对量的替代物上。我们提议研究在集成机学习背景下注意的可解释性,即每个数据点由无顺序地收集带有全球标签的事例组成。对于传统的多重参与学习问题和简单的扩展,有一个定义明确的“重要性”地面真理,可以用它作为解释的二元分类问题,我们可以从数量上加以评估。通过在几个数据模式上建立合成数据集,我们对基于关注的解释进行系统评估。我们发现,注意分布确实反映个别事例的相对重要性,但是在模型具有高分类性但关注模式与预期不相符的情况下,会发生无声失现象。基于这些观察,我们提议使用集合来尽量减少基于关注的解释的误导风险。