ShifText:评估文本分类模型脆弱性的简单黑盒办法 (ShufText: A Simple Black Box Approach to Evaluate the Fragility of Text Classification Models)

Text classification is the most basic natural language processing task. It has a wide range of applications ranging from sentiment analysis to topic classification. Recently, deep learning approaches based on CNN, LSTM, and Transformers have been the de facto approach for text classification. In this work, we highlight a common issue associated with these approaches. We show that these systems are over-reliant on the important words present in the text that are useful for classification. With limited training data and discriminative training strategy, these approaches tend to ignore the semantic meaning of the sentence and rather just focus on keywords or important n-grams. We propose a simple black box technique ShutText to present the shortcomings of the model and identify the over-reliance of the model on keywords. This involves randomly shuffling the words in a sentence and evaluating the classification accuracy. We see that on common text classification datasets there is very little effect of shuffling and with high probability these models predict the original class. We also evaluate the effect of language model pretraining on these models and try to answer questions around model robustness to out of domain sentences. We show that simple models based on CNN or LSTM as well as complex models like BERT are questionable in terms of their syntactic and semantic understanding.

翻译：文字分类是最基本的自然语言处理任务。语言分类是最基本的自然语言处理任务。语言分类具有广泛的应用范围, 从情绪分析到主题分类。最近, 以CNN、 LSTM、 LSTM和变换器为基础的深层次学习方法一直是文本分类的实际方法。在这项工作中,我们强调与这些方法相关的一个共同问题。我们显示这些系统过于依赖文本分类有用的关键词。由于培训数据和歧视性培训战略有限,这些方法往往忽略了该句的语义含义,而只是侧重于关键词或重要的 n- 克。我们建议采用简单的黑盒技术 ShutText 来展示模型的缺点,并查明模型对关键词的过度依赖性。这涉及随机地在句子中拼动文字并评估分类准确性。我们发现,在通用文本分类数据集中,很少产生摇动效应,而且这些模型极有可能预测原始类。我们还评估语言模型预培训对这些模型的影响,并试图解答关于域句外符号模型稳健性的问题。我们展示了基于CNNMSTM或LSTM的简单模型的简单模型, 以及复杂的模型非常复杂。