对文本流进行对抗性攻击的实验 (Experiments with adversarial attacks on text genres)

Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks, including non-topical classification, such as genre identification. However, often these approaches exhibit low reliability to minor alterations of the test texts. A related probelm concerns topical biases in the training corpus, for example, the prevalence of words on a specific topic in a specific genre can trick the genre classifier to recognise any text on this topic in this genre. In order to mitigate the reliability problem, this paper investigates techniques for attacking genre classifiers to understand the limitations of the transformer models and to improve their performance. While simple text attacks, such as those based on word replacement using keywords extracted by tf-idf, are not capable of deceiving powerful models like XLM-RoBERTa, we show that embedding-based algorithms which can replace some of the most ``significant'' words with words similar to them, for example, TextFooler, have the ability to influence model predictions in a significant proportion of cases.

翻译：基于培训前变压器的神经模型,如BERT或XLM-ROBERTA, 显示SOTA在许多NLP任务中的结果,包括非专题分类,如基因识别等。然而,这些方法往往对测试文本的微小改动缺乏可靠性。例如,在特定类型中特定主题的文字的流行性能涉及到培训材料中的主题偏差,例如,在特定类型中,基于特定主题的词的流行性能可以诱使基因分类器识别本类中任何关于这个主题的文字。为了减轻可靠性问题,本文调查了攻击基因分类器的技术,以了解变压器模型的局限性并改进它们的性能。简单的文字攻击,例如使用tf-idf提取的关键字替换的文字攻击,无法想象XLM-ROBERTA这样的强大模型,我们显示,基于嵌入式算法能够用类似语言取代一些最“最有意义的”的词,例如,TextFooler,有能力在相当大的情况下影响模型的预测。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/