Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
翻译:注意机制是几种成功的深度学习架构的核心组成部分,其基于一个关键的思想:“输出仅取决于输入的小(但未知)部分。”在图像字幕和语言翻译等多个实际应用中,这在大多数情况下是正确的。在具有注意机制的训练模型中,一个中间模块的输出用于编码负责输出的输入部分,通常作为一种“看”到网络“推理”的方式。我们为一种与注意模型体系结构一起使用的分类问题变量定义了这样的概念,这被称为选择依赖分类(SDC)。在这种设置下,我们演示了注意模型可以准确但无法解释的各种错误模式,并显示出由于训练而产生这样的模型。我们说明了可以强调和减轻这种行为的各种情况。最后,我们使用SDC任务的可解释性客观定义来评估几种旨在鼓励稀疏性的注意模型学习算法,并证明这些算法有助于提高可解释性。