Recent deep-learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. Importantly, the proposed attributions assign importance to interactions between features, in addition to features in isolation. These attributions are shown to yield insights across real-world domains, including bio-imaging, cosmology image and natural-language processing. We then show how these attributions can be used to directly improve the generalization of a neural network or to distill it into a simple model. Throughout the chapter, we emphasize the use of reality checks to scrutinize the proposed interpretation techniques.
翻译:最近的深造模型通过学习许多变量的复杂功能,往往以解释为代价,取得了令人印象深刻的预测性业绩,本章涵盖最近旨在解释模型的工作,对单一预测的特征和特征组给予重视。重要的是,拟议的属性重视各特征之间的相互作用,除了孤立的特征外,还重视各特征之间的相互作用。这些属性显示能够产生对现实世界各个领域的洞察力,包括生物成像、宇宙学图像和自然语言处理。然后,我们展示这些属性如何被用来直接改进神经网络的普及或将其提炼成一个简单的模型。我们在整个章节中强调利用现实检查来审查拟议的解释技术。