End-to-end neural NLP architectures are notoriously difficult to understand, which gives rise to numerous efforts towards model explainability in recent years. An essential principle of model explanation is Faithfulness, i.e., an explanation should accurately represent the reasoning process behind the model's prediction. This survey first discusses the definition and evaluation of Faithfulness, as well as its significance for explainability. We then introduce the recent advances in faithful explanation by grouping approaches into five categories: similarity methods, analysis of model-internal structures, backpropagation-based methods, counterfactual intervention, and self-explanatory models. Each category will be illustrated with its representative studies, advantages, and shortcomings. Finally, we discuss all the above methods in terms of their common virtues and limitations, and reflect on future work directions towards faithful explainability. For researchers interested in studying interpretability, this survey will offer an accessible and comprehensive overview of the area, laying the basis for further exploration. For users hoping to better understand their own models, this survey will be an introductory manual helping with choosing the most suitable explanation method(s).
翻译:端到端神经液核电离层结构很难理解,这导致近年来在示范解释方面做出了许多努力。模型解释的基本原则是忠诚,即解释应准确地代表模型预测背后的推理过程。本调查首先讨论对忠诚的定义和评价,以及其解释的意义。然后我们介绍最近通过将方法分为五类在忠实解释方面取得的进展:相似方法、模型内部结构分析、反反动方法、反事实干预和自我解释模型。每个类别都将用其代表性研究、优缺点来说明。最后,我们从所有上述方法的共同优点和局限性的角度来讨论所有上述方法,并思考今后对忠实解释的方向。对于研究可解释性的研究人员来说,本调查将为该地区提供一个方便和全面的概览,为进一步探索奠定基础。对于希望更好地了解自己模型的用户来说,本调查将是一个介绍性手册,帮助选择最合适的解释方法。