稳定和可解释的注意 (SEAT: Stable and Explainable Attention)

Currently, attention mechanism becomes a standard fixture in most state-of-the-art natural language processing (NLP) models, not only due to outstanding performance it could gain, but also due to plausible innate explanation for the behaviors of neural architectures it provides, which is notoriously difficult to analyze. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embedding vectors, which impedes it from becoming a faithful explanation tool. Thus, a natural question is whether we can find some substitute of the current attention which is more stable and could keep the most important characteristics on explanation and prediction of attention. In this paper, to resolve the problem, we provide a first rigorous definition of such alternate namely SEAT (Stable and Explainable Attention). Specifically, a SEAT should has the following three properties: (1) Its prediction distribution is enforced to be close to the distribution based on the vanilla attention; (2) Its top-k indices have large overlaps with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. Finally, through intensive experiments on various datasets, we compare our SEAT with other baseline methods using RNN, BiLSTM and BERT architectures via six different evaluation metrics for model interpretation, stability and accuracy. Results show that SEAT is more stable against different perturbations and randomness while also keeps the explainability of attention, which indicates it is a more faithful explanation. Moreover, compared with vanilla attention, there is almost no utility (accuracy) degradation for SEAT.

翻译：目前,关注机制在大多数最先进的自然语言处理(NLP)模型中成为标准固定,这不仅是因为它能够取得杰出的性能,而且是因为它所提供的神经结构的行为令人生动的解释,这是众所周知的难以分析的。然而,最近的研究表明,在培训或测试过程中,对随机性和扰动的注意不稳定,例如随机种子和嵌入矢量的轻微扰动,这阻碍了它成为一个忠实的解释工具。因此,自然的问题是,我们能否找到一些替代目前更加稳定的注意,并且能够保持解释和预测注意力的最重要特点。在本文中,为了解决问题,我们提供了一种对神经结构结构的首个严格定义,即SEAT(表和可解释性),具体地说,SEAT应具有以下三种特性:(1) 其预测分布与基于香草模型的分布密切接近;(2) 其顶级指数与香草的注意有很大的重叠(它与香草的注意高度重复;(3) AT的准确的常态性评估,可以解释和最重要的特征。稳定性估算是SEAT的六种稳定性,最后,它显示任何轻微的基底的退化的降解性,它会显示它与其他的降解的分布。