Pre-trained contextualized language models (PrLMs) have led to strong performance gains in downstream natural language understanding tasks. However, PrLMs can still be easily fooled by adversarial word substitution, which is one of the most challenging textual adversarial attack methods. Existing defence approaches suffer from notable performance loss and complexities. Thus, this paper presents a compact and performance-preserved framework, Anomaly Detection with Frequency-Aware Randomization (ADFAR). In detail, we design an auxiliary anomaly detection classifier and adopt a multi-task learning procedure, by which PrLMs are able to distinguish adversarial input samples. Then, in order to defend adversarial word substitution, a frequency-aware randomization process is applied to those recognized adversarial input samples. Empirical results show that ADFAR significantly outperforms those newly proposed defense methods over various tasks with much higher inference speed. Remarkably, ADFAR does not impair the overall performance of PrLMs. The code is available at https://github.com/LilyNLP/ADFAR
翻译:事先经过培训的背景语言模型(PrLMS)在下游自然语言理解任务中取得了显著的成绩;然而,PrLMS仍然容易被对抗性替换词所蒙骗,而对抗性替换词是最具挑战性的文字对抗攻击方法之一;现有的防御方法存在显著的性能损失和复杂性;因此,本文件提出了一个精密和有性能保障的框架,即 " 异常探测与频率软件随机化 " (ADFAR),详细说来,我们设计了一个辅助异常探测分级器,并采用多任务学习程序,使普尔LMs能够区分对抗性输入样本;然后,为了保护对抗性替换词,对公认的对抗性输入样本适用了频率自觉随机化程序。ADFAR的经验显示,ADFAR大大超越了这些新提出的防御方法,其推导速度要高得多。值得注意的是,ADFAR没有损害PLMs的总体性能。该代码见https://github.com/LilNP/ADFAR)。