Prototype-based interpretability methods provide intuitive explanations of model prediction by comparing samples to a reference set of memorized exemplars or typical representatives in terms of similarity. In the field of sequential data modeling, similarity calculations of prototypes are usually based on encoded representation vectors. However, due to highly recursive functions, there is usually a non-negligible disparity between the prototype-based explanations and the original input. In this work, we propose a Self-Explaining Selective Model (SESM) that uses a linear combination of prototypical concepts to explain its own predictions. The model employs the idea of case-based reasoning by selecting sub-sequences of the input that mostly activate different concepts as prototypical parts, which users can compare to sub-sequences selected from different example inputs to understand model decisions. For better interpretability, we design multiple constraints including diversity, stability, and locality as training objectives. Extensive experiments in different domains demonstrate that our method exhibits promising interpretability and competitive accuracy.
翻译:原型解释性方法通过将样本与参考集中的典型代表或代表性代表进行相似度比较,为模型预测提供直观的解释。在顺序数据建模领域中,原型的相似度计算通常基于编码表示向量。然而,由于高度递归的函数,原型解释与原始输入之间通常存在重要差异。在本文中,我们提出一种自我解释的选择模型(SESM),它使用原型概念的线性组合来解释自己的预测。该模型采用基于案例的推理思想,通过选择激活不同概念作为原型部分的子序列,用户可以将其与从不同示例输入中选择的子序列进行比较,以了解模型的决策。为了更好的可解释性,我们设计了多个约束条件,包括多样性,稳定性和局部性作为训练目标。在不同领域的广泛实验表明,我们的方法表现出具有竞争力的准确性和前途光明的可解释性。