RNN models have achieved the state-of-the-art performance in a wide range of text mining tasks. However, these models are often regarded as black-boxes and are criticized due to the lack of interpretability. In this paper, we enhance the interpretability of RNNs by providing interpretable rationales for RNN predictions. Nevertheless, interpreting RNNs is a challenging problem. Firstly, unlike existing methods that rely on local approximation, we aim to provide rationales that are more faithful to the decision making process of RNN models. Secondly, a flexible interpretation method should be able to assign contribution scores to text segments of varying lengths, instead of only to individual words. To tackle these challenges, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results demonstrate the faithfulness and interpretability of the proposed attribution method. Comprehensive analysis shows that our attribution method could unveil the useful linguistic knowledge captured by RNNs. Some analysis further demonstrates our method could be utilized as a debugging tool to examine the vulnerability and failure reasons of RNNs, which may lead to several promising future directions to promote generalization ability of RNNs.
翻译:然而,这些模型往往被视为黑箱,并由于缺乏可解释性而受到批评。在本文件中,我们通过提供可解释的RNN预测理由,提高了RNN的可解释性。然而,解释RNN的模型是一个具有挑战性的问题。首先,与依赖当地近比的现有方法不同,我们的目标是提供更忠实于RNN模式决策过程的理由。第二,灵活解释方法应能为不同长度的文本部分分配贡献分数,而不是仅仅为单词。为了应对这些挑战,我们建议一种新颖的归属方法,称为REAT,为RNN预测提供解释。REAT把对RNN的最后预测转换成投入文本中每个词的添加性贡献。这种添加式的去composition使得REAT能够进一步获得定级归属分数。此外,REAT通常适用于各种 RNNE结构,包括GRU、LSTM及其双向版本。为了应对这些挑战,我们提出了一种新的归属方法,实验性结果可以表明我们目前采用的一种解释方法,即全面性分析方法,可以证明我们目前采用的一种解释性的方法。