Modern machine learning models are complicated. Most of them rely on convoluted latent representations of their input to issue a prediction. To achieve greater transparency than a black-box that connects inputs to predictions, it is necessary to gain a deeper understanding of these latent representations. To that aim, we propose SimplEx: a user-centred method that provides example-based explanations with reference to a freely selected set of examples, called the corpus. SimplEx uses the corpus to improve the user's understanding of the latent space with post-hoc explanations answering two questions: (1) Which corpus examples explain the prediction issued for a given test example? (2) What features of these corpus examples are relevant for the model to relate them to the test example? SimplEx provides an answer by reconstructing the test latent representation as a mixture of corpus latent representations. Further, we propose a novel approach, the Integrated Jacobian, that allows SimplEx to make explicit the contribution of each corpus feature in the mixture. Through experiments on tasks ranging from mortality prediction to image classification, we demonstrate that these decompositions are robust and accurate. With illustrative use cases in medicine, we show that SimplEx empowers the user by highlighting relevant patterns in the corpus that explain model representations. Moreover, we demonstrate how the freedom in choosing the corpus allows the user to have personalized explanations in terms of examples that are meaningful for them.
翻译:现代机器学习模式是复杂的。 多数现代机器学习模式依靠其投入的混杂潜在表达形式来发布预测。 为了实现比将投入与预测联系起来的黑盒更大的透明度, 有必要对这些潜在表达形式有更深入的了解。 为此, 我们提议SimplEx: 一种以用户为中心的方法, 提供以自由选择的一组例子( 称为Capper) 为基础的基于实例的解释。 SimplEx 利用这个程序来提高用户对潜在空间的理解, 并用热量后的解释回答两个问题:(1) 哪些物质实例解释为特定试验示例发布的预测? (2) 这些物质实例的哪些特征与模型将其与测试示例联系起来有关? SimplEx 提供了一种答案, 将测试潜在代表形式重建为物理潜在表现的混合体。 此外, 我们提出一种新颖的方法, 即综合雅各布, 允许SimplEx 来明确说明混合物中每个物质特征的贡献。 通过对从死亡率预测到图像分类等任务进行实验, 我们证明这些分解位置是可靠和准确的。 在医学的示例中, 我们用示例表明SimplEx 是如何使用户能够以有意义的方式解释。