Post-hoc explanations for black box models have been studied extensively in classification and regression settings. However, explanations for models that output similarity between two inputs have received comparatively lesser attention. In this paper, we provide model agnostic local explanations for similarity learners applicable to tabular and text data. We first propose a method that provides feature attributions to explain the similarity between a pair of inputs as determined by a black box similarity learner. We then propose analogies as a new form of explanation in machine learning. Here the goal is to identify diverse analogous pairs of examples that share the same level of similarity as the input pair and provide insight into (latent) factors underlying the model's prediction. The selection of analogies can optionally leverage feature attributions, thus connecting the two forms of explanation while still maintaining complementarity. We prove that our analogy objective function is submodular, making the search for good-quality analogies efficient. We apply the proposed approaches to explain similarities between sentences as predicted by a state-of-the-art sentence encoder, and between patients in a healthcare utilization application. Efficacy is measured through quantitative evaluations, a careful user study, and examples of explanations.
翻译:在分类和回归设置中,对黑盒模型的后热解释进行了广泛研究。然而,对导出两种投入的相似性的模型的解释相对较少受到注意。在本文中,我们为适用于表格和文本数据的类似学习者提供模型的不可知本地解释;我们首先提出一种方法,提供特征属性,解释黑盒相似学习者确定的一对投入的相似性;然后提出类比,作为机器学习的一种新解释形式。这里的目标是确定与输入对口具有相同程度的不同相似的相近范例,并对模型预测所依据的(相对的)因素提供洞察。选择类比可以选择性地利用特征属性,从而在保持互补性的同时将两种解释形式联系起来。我们证明我们的类比目标功能是次式的,从而高效地寻找高质量的类比。我们采用建议的方法来解释由状态的句子编码所预测的相似性,以及在医疗利用应用中病人之间的类似性。成效是通过定量评估、仔细的用户研究和示例来衡量的。