Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.
翻译:NLP神经网络日益复杂和广泛,人们越来越担心这些模型是否应该使用。解释模型有助于解决安全和伦理问题,对问责制至关重要。解释有助于以人类可以理解的术语提供这些解释。此外,休克后的方法在学习模型后提供解释,一般是模型不可知性。这一调查对近期的可解释后方法如何向人类传达解释进行了分类,对每种方法进行了深入的探讨,并验证了这些方法,因为后者往往是共同关注的问题。