Recent years have witnessed increasing interests in developing interpretable models in Natural Language Processing (NLP). Most existing models aim at identifying input features such as words or phrases important for model predictions. Neural models developed in NLP however often compose word semantics in a hierarchical manner and text classification requires hierarchical modelling to aggregate local information in order to deal with topic and label shifts more effectively. As such, interpretation by words or phrases only cannot faithfully explain model decisions in text classification. This paper proposes a novel Hierarchical INTerpretable neural text classifier, called Hint, which can automatically generate explanations of model predictions in the form of label-associated topics in a hierarchical manner. Model interpretation is no longer at the word level, but built on topics as the basic semantic unit. Experimental results on both review datasets and news datasets show that our proposed approach achieves text classification results on par with existing state-of-the-art text classifiers, and generates interpretations more faithful to model predictions and better understood by humans than other interpretable neural text classifiers.
翻译:近年来,人们越来越关注自然语言处理(NLP)中可解释的模型(NLP)的开发。大多数现有模型旨在确定投入特征,例如对于模型预测十分重要的词句或词句。在NLP中开发的神经模型,尽管通常以等级方式拼写词义和文本分类,但要求进行等级建模以汇总当地信息,以便更有效地处理专题和标签转移。因此,用文字或词组解释只能忠实地解释文本分类中的示范决定。本文件提议建立一个名为Hint的新型等级化国际神经素质分类器,它可以以等级方式自动生成以标签相关专题形式对模型预测的解释。模型解释在字级层面不再持续,而是建立在作为基本语义单位的主题之上。关于审查数据集和新闻数据集的实验结果表明,我们拟议的方法在与现有最新文本分类师相比,实现了文本分类的结果,并产生比其他可解释的神经素分类师更忠实的诠释。