使关注机制在虚拟反向培训中更有力和更能解释 (Making Attention Mechanisms More Robust and Interpretable with Virtual Adversarial Training)

Although attention mechanisms have become fundamental components of deep learning models, they are vulnerable to perturbations, which may degrade the prediction performance and model interpretability. Adversarial training (AT) for attention mechanisms has successfully reduced such drawbacks by considering adversarial perturbations. However, this technique requires label information, and thus, its use is limited to supervised settings. In this study, we explore the concept of incorporating virtual AT (VAT) into the attention mechanisms, by which adversarial perturbations can be computed even from unlabeled data. To realize this approach, we propose two general training techniques, namely VAT for attention mechanisms (Attention VAT) and "interpretable" VAT for attention mechanisms (Attention iVAT), which extend AT for attention mechanisms to a semi-supervised setting. In particular, Attention iVAT focuses on the differences in attention; thus, it can efficiently learn clearer attention and improve model interpretability, even with unlabeled data. Empirical experiments based on six public datasets revealed that our techniques provide better prediction performance than conventional AT-based as well as VAT-based techniques, and stronger agreement with evidence that is provided by humans in detecting important words in sentences. Moreover, our proposal offers these advantages without needing to add the careful selection of unlabeled data. That is, even if the model using our VAT-based technique is trained on unlabeled data from a source other than the target task, both the prediction performance and model interpretability can be improved.

翻译：虽然注意机制已成为深层次学习模式的基本组成部分,但它们容易受到干扰,从而可能降低预测的性能和模型的解释性。相反的注意培训(AT)机制通过考虑对抗性扰动成功地减少了这种缺点。然而,这种技术需要标签信息,因此其使用仅限于监督环境。在本研究中,我们探索了将虚拟AT(VAT)纳入注意机制的概念,通过这种机制,即使从未标定的目标数据中也可以计算对抗性扰动。为了实现这一方法,我们建议了两种一般性的培训技术,即注意机制的增值税(AT)和注意机制的“可解释性”增值税(AT),通过考虑对抗性扰动性扰动性干扰机制成功地减少了这种缺点。然而,这种技术将注意力机制扩大到半超常的环境。特别是,注意性能AT(VAT)集中关注差异;因此,它可以有效地学习更清楚的关注和改进模型的可解释性,即使没有标定数据。基于六种公共数据集的实证性实验表明,我们的技术比常规的AT(VAT)机制(AT VAT)和“可解释性”增值性增值性增值性增值性增值性增值性增值性增值性增值性增值性(VT)技术提供了更强的优势,而我们的数据选择需要从重要的数据。在仔细性数据选择中提供更有力的证据。