In recent years, NLP practitioners have converged on the following practice: (i) import an off-the-shelf pretrained (masked) language model; (ii) append a multilayer perceptron atop the CLS token's hidden representation (with randomly initialized weights); and (iii) fine-tune the entire model on a downstream task (MLP). This procedure has produced massive gains on standard NLP benchmarks, but these models remain brittle, even to mild adversarial perturbations, such as word-level synonym substitutions. In this work, we demonstrate surprising gains in adversarial robustness enjoyed by Model-tuning Via Prompts (MVP), an alternative method of adapting to downstream tasks. Rather than modifying the model (by appending an MLP head), MVP instead modifies the input (by appending a prompt template). Across three classification datasets, MVP improves performance against adversarial word-level synonym substitutions by an average of 8% over standard methods and even outperforms adversarial training-based state-of-art defenses by 3.5%. By combining MVP with adversarial training, we achieve further improvements in robust accuracy while maintaining clean accuracy. Finally, we conduct ablations to investigate the mechanism underlying these gains. Notably, we find that the main causes of vulnerability of MLP can be attributed to the misalignment between pre-training and fine-tuning tasks, and the randomly initialized MLP parameters. Code is available at https://github.com/acmi-lab/mvp
翻译:近些年来,NLP执业者在以下做法上趋于一致:(一) 进口一个现成的、未经事先训练的(模拟的)语言模型;(二) 在CLS质询的隐藏代表处(随机初始权重)附加多层感应器;(三) 将整个模型微调到下游任务(MLP)上。该程序在标准NLP基准上产生了巨大的收益。但这些模型仍然很不稳定,甚至到温和的对抗性调调试,如单词级同声调替代。在这项工作中,我们展示了在对抗性调校准 Via 提示(MVP) 所享有的对抗性强势性强势方面的惊人进展,这是适应下游任务的一种替代方法(随机初始权重;以及(随机初始权重); 将整个模型调整整个模式,而不是在下游任务上(附加一个快速模板)修改输入。在三个分类数据集中,MVP提高工作业绩,在对抗性词级调调调调调调调时,标准方法上平均8%,甚至超越了对质调制训练基础性训练基础性调,MLMVPML任务的精度,我们最后的精确性平级研究,可以实现。我们最后的精确性平调。</s>