Developing a suitable Deep Neural Network (DNN) often requires significant iteration, where different model versions are evaluated and compared. While metrics such as accuracy are a powerful means to succinctly describe a model's performance across a dataset or to directly compare model versions, practitioners often wish to gain a deeper understanding of the factors that influence a model's predictions. Interpretability techniques such as gradient-based methods and local approximations can be used to examine small sets of inputs in fine detail, but it can be hard to determine if results from small sets generalize across a dataset. We introduce IMACS, a method that combines gradient-based model attributions with aggregation and visualization techniques to summarize differences in attributions between two DNN image models. More specifically, IMACS extracts salient input features from an evaluation dataset, clusters them based on similarity, then visualizes differences in model attributions for similar input features. In this work, we introduce a framework for aggregating, summarizing, and comparing the attribution information for two models across a dataset; present visualizations that highlight differences between 2 image classification models; and show how our technique can uncover behavioral differences caused by domain shift between two models trained on satellite images.
翻译:开发合适的深神经网络(DNN)往往需要大量的迭代,在这种迭代中,对不同的模型版本进行评价和比较。虽然精确度等衡量标准是简明描述模型在数据集中的性能或直接比较模型版本的有力手段,但实践者往往希望更深入地了解影响模型预测的因素。可解释性技术,例如梯度法和地方近似法,可以用来详细审查小批投入,但很难确定小组对数据集进行概括分析的结果。我们引入了基于梯度的模型属性与汇总和可视化技术相结合的方法,以总结两个DNN图像模型的属性差异。更具体地说,IMACS从评价数据集中提取突出的输入特征,根据相似性加以分组,然后对类似输入特征的模型属性差异进行视觉化分析。在这项工作中,我们引入了一个框架,用于汇总、总结和比较两个数据集的归因信息;我们展示了显示两个图像分类模型差异的可视化方法;以及显示我们所培训的域图能如何通过两个图像模型区分不同。