Recent years have witnessed increasing interests in developing interpretable models in Natural Language Processing (NLP). Most existing models aim at identifying input features such as words or phrases important for model predictions. Neural models developed in NLP however often compose word semantics in a hierarchical manner. Interpretation by words or phrases only thus cannot faithfully explain model decisions. This paper proposes a novel Hierarchical INTerpretable neural text classifier, called Hint, which can automatically generate explanations of model predictions in the form of label-associated topics in a hierarchical manner. Model interpretation is no longer at the word level, but built on topics as the basic semantic unit. Experimental results on both review datasets and news datasets show that our proposed approach achieves text classification results on par with existing state-of-the-art text classifiers, and generates interpretations more faithful to model predictions and better understood by humans than other interpretable neural text classifiers.
翻译:近些年来,人们越来越有兴趣在自然语言处理(NLP)中开发可解释的模型(NLP),大多数现有模型旨在确定对模型预测十分重要的词句或短语等输入特征。NLP开发的神经模型往往以等级方式拼写词义。只有用文字或短语解释才能忠实地解释示范决定。本文件建议建立一个名为Hint的新型等级化的INTER可神经文本分类器,它可以以等级化的方式自动以标签相关专题的形式对模型预测作出解释。模型解释不再停留在文字层面,而是建立在作为基本语义单位的主题之上。关于审查数据集和新闻集的实验结果显示,我们拟议的方法在与现有最先进的文本分类器同步的基础上实现了文本分类结果,并产生比其他可解释的神经文本分类器更忠实地理解模型预测和人类更理解的解释。