Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks, including non-topical classification, such as genre identification. However, often these approaches exhibit low reliability to minor alterations of the test texts. A related probelm concerns topical biases in the training corpus, for example, the prevalence of words on a specific topic in a specific genre can trick the genre classifier to recognise any text on this topic in this genre. In order to mitigate the reliability problem, this paper investigates techniques for attacking genre classifiers to understand the limitations of the transformer models and to improve their performance. While simple text attacks, such as those based on word replacement using keywords extracted by tf-idf, are not capable of deceiving powerful models like XLM-RoBERTa, we show that embedding-based algorithms which can replace some of the most ``significant'' words with words similar to them, for example, TextFooler, have the ability to influence model predictions in a significant proportion of cases.
翻译:基于培训前变压器的神经模型,如BERT或XLM-ROBERTA, 显示SOTA在许多NLP任务中的结果,包括非专题分类,如基因识别等。然而,这些方法往往对测试文本的微小改动缺乏可靠性。例如,在特定类型中特定主题的文字的流行性能涉及到培训材料中的主题偏差,例如,在特定类型中,基于特定主题的词的流行性能可以诱使基因分类器识别本类中任何关于这个主题的文字。为了减轻可靠性问题,本文调查了攻击基因分类器的技术,以了解变压器模型的局限性并改进它们的性能。简单的文字攻击,例如使用tf-idf提取的关键字替换的文字攻击,无法想象XLM-ROBERTA这样的强大模型,我们显示,基于嵌入式算法能够用类似语言取代一些最“最有意义的”的词,例如,TextFooler,有能力在相当大的情况下影响模型的预测。