Realizing when a model is right for a wrong reason is not trivial and requires a significant effort by model developers. In some cases, an input salience method, which highlights the most important parts of the input, may reveal problematic reasoning. But scrutinizing highlights over many data instances is tedious and often infeasible. Furthermore, analyzing examples in isolation does not reveal general patterns in the data or in the model's behavior. In this paper we aim to address these issues and go from understanding single examples to understanding entire datasets and models. The methodology we propose is based on aggregated salience maps. Using this methodology we address multiple distinct but common model developer needs by showing how problematic data and model behavior can be identified -- a necessary first step for improving the model.
翻译:当模型对错的原因正确时,认识模型并非微不足道,需要模型开发者做出重大努力。在某些情况下,强调输入最重要的部分的投入突出方法可能暴露出有问题的推理。但是,对许多数据实例的突出内容进行仔细审查是乏味的,而且往往不可行。此外,孤立地分析实例并不揭示数据或模型行为的一般模式。在本文件中,我们力求解决这些问题,从理解单个实例到理解整个数据集和模型。我们提出的方法基于汇总突出的地图。我们采用这种方法,通过显示如何发现有问题的数据和模型行为,解决多种不同但共同的模型开发者的需要,这是改进模型的必要的第一步。