To build an interpretable neural text classifier, most of the prior work has focused on designing inherently interpretable models or finding faithful explanations. A new line of work on improving model interpretability has just started, and many existing methods require either prior information or human annotations as additional inputs in training. To address this limitation, we propose the variational word mask (VMASK) method to automatically learn task-specific important words and reduce irrelevant information on classification, which ultimately improves the interpretability of model predictions. The proposed method is evaluated with three neural text classifiers (CNN, LSTM, and BERT) on seven benchmark text classification datasets. Experiments show the effectiveness of VMASK in improving both model prediction accuracy and interpretability.
翻译:为了建立一个可解释的神经文本分类器,以前的大部分工作侧重于设计内在可解释的模型或找到可靠的解释。关于改进模型可解释性的新工作刚刚开始,许多现有方法要求事先提供信息或人文说明,作为培训的补充投入。为解决这一限制,我们提议采用变式字面遮罩(VMASK)方法,自动学习与任务有关的重要字眼,并减少与分类无关的信息,这最终提高了模型预测的可解释性。在七个基准文本分类数据集方面,用三个神经文本分类器(CNN、LSTM和BERT)对拟议方法进行了评价。实验显示VMASK在改进模型预测准确性和可解释性方面的有效性。