Automated predictions require explanations to be interpretable by humans. One type of explanation is a rationale, i.e., a selection of input features such as relevant text snippets from which the model computes the outcome. However, a single overall selection does not provide a complete explanation, e.g., weighing several aspects for decisions. To this end, we present a novel self-interpretable model called ConRAT. Inspired by how human explanations for high-level decisions are often based on key concepts, ConRAT extracts a set of text snippets as concepts and infers which ones are described in the document. Then, it explains the outcome with a linear aggregation of concepts. Two regularizers drive ConRAT to build interpretable concepts. In addition, we propose two techniques to boost the rationale and predictive performance further. Experiments on both single- and multi-aspect sentiment classification tasks show that ConRAT is the first to generate concepts that align with human rationalization while using only the overall label. Further, it outperforms state-of-the-art methods trained on each aspect label independently.
翻译:自动预测需要由人来解释。 一种解释是一种理由, 即选择一些输入特征, 如相关文本片段, 模型从中计算结果。 然而, 单一的整体选择并不能提供完整的解释, 例如权衡决定的多个方面。 为此, 我们提出了一个全新的自我解释模型, 叫做 ConRAT。 受对高层决策的人类解释往往基于关键概念的启发, ConRAT 提取了一组文本片段作为文件中描述的概念和推论。 然后, 它用一系列概念的线性汇总来解释结果。 有两个规范者驱动 ConRAT 构建可解释的概念。 此外, 我们提出两种技术来进一步强化理论和预测性能。 对单一和多层情绪分类任务进行的实验显示, ConRAT 是第一个产生与人类合理化相一致的概念, 而仅使用整体标签。 此外, 它超越了在每一方面标签上独立培训的状态方法 。