Although self-attention based models such as Transformers have achieved remarkable successes on natural language processing (NLP) tasks, recent studies reveal that they have limitations on modeling sequential transformations (Hahn, 2020), which may prompt re-examinations of recurrent neural networks (RNNs) that demonstrated impressive results on handling sequential data. Despite many prior attempts to interpret RNNs, their internal mechanisms have not been fully understood, and the question on how exactly they capture sequential features remains largely unclear. In this work, we present a study that shows there actually exist some explainable components that reside within the hidden states, which are reminiscent of the classical n-grams features. We evaluated such extracted explainable features from trained RNNs on downstream sentiment analysis tasks and found they could be used to model interesting linguistic phenomena such as negation and intensification. Furthermore, we examined the efficacy of using such n-gram components alone as encoders on tasks such as sentiment analysis and language modeling, revealing they could be playing important roles in contributing to the overall performance of RNNs. We hope our findings could add interpretability to RNN architectures, and also provide inspirations for proposing new architectures for sequential data.
翻译:尽管在自然语言处理(NLP)任务方面基于自我注意的模式,如变换器等,在自然语言处理(NLP)任务方面取得了显著的成功,但最近的研究表明,这些模式在模拟顺序转换(Hahn,2020年)方面有局限性,这可能会促使人们重新审查在处理顺序数据方面显示出令人印象深刻结果的经常性神经网络(RNN)。尽管以前曾多次试图解释RNN,但其内部机制尚未得到充分理解,关于它们如何精确地捕捉相继特征的问题在很大程度上仍不清楚。在这项工作中,我们提出一项研究,表明在隐藏的状态中确实存在一些可解释的成分,这些成分是古典正正正 ngram 特征的象征。我们评估了受过训练的RNNS在下游情绪分析任务上的可解释性特征,发现这些特征可用于模拟有趣的语言现象,例如否定和强化。此外,我们研究了仅仅使用n-gram组件作为情绪分析和语言建模等任务的编码器的功效,表明它们可以在促进RNNNS的整体性能起到重要作用。我们希望我们的调查结果能够对RNNS结构增加解释性,并且提供提出新的结构的灵感。