What makes two images similar? We propose new approaches to generate model-agnostic explanations for image similarity, search, and retrieval. In particular, we extend Class Activation Maps (CAMs), Additive Shapley Explanations (SHAP), and Locally Interpretable Model-Agnostic Explanations (LIME) to the domain of image retrieval and search. These approaches enable black and grey-box model introspection and can help diagnose errors and understand the rationale behind a model's similarity judgments. Furthermore, we extend these approaches to extract a full pairwise correspondence between the query and retrieved image pixels, an approach we call "joint interpretations". Formally, we show joint search interpretations arise from projecting Harsanyi dividends, and that this approach generalizes Shapley Values and The Shapley-Taylor indices. We introduce a fast kernel-based method for estimating Shapley-Taylor indices and empirically show that these game-theoretic measures yield more consistent explanations for image similarity architectures.
翻译:是什么使两种图像相似? 我们提出新的方法来生成图像相似性、 搜索和检索的模型- 不可知性解释。 特别是, 我们将分类动画图( CAMs) 、 Additive 形状解释( SHAP ) 和本地解释模型- 模型解释( LIME ) 推广到图像检索和搜索领域。 这些方法可以使黑白框模型进行反省, 有助于诊断错误, 理解模型相似性判断背后的理由。 此外, 我们推广这些方法来提取查询和检索到的图像像素之间的完全对称对应, 我们称之为“ 联合解释 ” 。 形式上, 我们展示了从投影 Harsanyi 红利 中产生的联合搜索解释, 以及这种方法一般化了“ 显性值” 和“ 夏普利- Taylor 指数 ” 。 我们采用了快速的内核模型方法来估计沙普利- 泰尔指数, 并用实验性地显示这些游戏- 计量方法为图像相似性结构提供更一致的解释 。