Prototype-based interpretability methods provide intuitive explanations of model prediction by comparing samples to a reference set of memorized exemplars or typical representatives in terms of similarity. In the field of sequential data modeling, similarity calculations of prototypes are usually based on encoded representation vectors. However, due to highly recursive functions, there is usually a non-negligible disparity between the prototype-based explanations and the original input. In this work, we propose a Self-Explaining Selective Model (SESM) that uses a linear combination of prototypical concepts to explain its own predictions. The model employs the idea of case-based reasoning by selecting sub-sequences of the input that mostly activate different concepts as prototypical parts, which users can compare to sub-sequences selected from different example inputs to understand model decisions. For better interpretability, we design multiple constraints including diversity, stability, and locality as training objectives. Extensive experiments in different domains demonstrate that our method exhibits promising interpretability and competitive accuracy.
翻译:以原型为基础的解释方法提供了模型预测的直观解释,将样本与一组记忆化的模拟模型或典型代表的参照集进行比较,从而得出相似的相似性。在相继数据建模领域,原型的类似计算通常以编码代表矢量为基础。然而,由于高度循环功能,原型解释和原始输入之间通常存在不可忽略的差别。在这项工作中,我们提出了一个自我解释选择性模型(SISM),它使用一种原型概念的线性组合来解释其自身的预测。该模型采用基于案例的推理理念,选择主要激活不同概念作为原型部分的投入的次序列,用户可以将其与从不同实例投入中选择的次序列进行比较,以了解示范决定。为了更好地解释,我们设计了多种限制因素,包括多样性、稳定性和地点作为培训目标。不同领域的广泛实验表明,我们的方法显示了有希望的解释性和竞争性的准确性。