As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for natural language processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term \textit{interpretability} and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are divided into three categories: 1) explaining the model's predictions through related input features; 2) explaining through natural language explanation; 3) probing the hidden states of models and word representations.
翻译:过去十年来,随着深层学习技术在各个领域的使用增加,对黑盒模型不透明的抱怨增加,导致深层学习模型更加注重透明度,这项工作调查了各种方法,以提高天然语言处理任务深层神经网络的可解释性,包括机器翻译和情绪分析。我们在这项工作开始时全面讨论了“textit{解释性}”一词的定义及其各个方面。本调查所收集和总结的方法仅与当地解释有关,分为三类:1)通过相关输入特征解释模型的预测;2)通过自然语言解释解释解释;3)研究模型和文字表述的隐蔽状态。