Adversarial training (AT) is one of the most reliable methods for defending against adversarial attacks in machine learning. Variants of this method have been used as regularization mechanisms to achieve SOTA results on NLP benchmarks, and they have been found to be useful for transfer learning and continual learning. We search for the reasons for the effectiveness of AT by contrasting vanilla and adversarially fine-tuned BERT models. We identify partial preservation of BERT's syntactic abilities during fine-tuning as the key to the success of AT. We observe that adversarially fine-tuned models remain more faithful to BERT's language modeling behavior and are more sensitive to the word order. As concrete examples of syntactic abilities, an adversarially fine-tuned model could have an advantage of up to 38% on anaphora agreement and up to 11% on dependency parsing. Our analysis demonstrates that vanilla fine-tuning oversimplifies the sentence representation by focusing heavily on one or a few label-indicative words. AT, however, moderates the effect of these influential words and encourages representational diversity. This allows for a more hierarchical representation of a sentence and leads to the mitigation of BERT's loss of syntactic abilities.
翻译:对抗性培训是防止机器学习中对抗性攻击的最可靠方法之一。这一方法的变式已被作为正规化机制,用于在NLP基准基准上实现SOTA结果,并被认为对转移学习和持续学习有用。我们通过对比香草和对抗性微调的BERT模型寻找AT有效性的理由。我们确定在微调期间部分保留BERT的综合能力是AT成功的关键。我们注意到,对抗性微调模式仍然更忠实于BERT的语言模拟行为,并且对词顺序更为敏感。作为合成能力的具体例子,对抗性微调模式的优势可能是对Apphora协议高达38%,对依赖性分析则高达11%。我们的分析表明,香草微调使判刑代表过于简单化,主要侧重于一个或几个标签反动词。不过,这缓解了这些有影响力的词的影响,鼓励了语言的顺序。这可以使Apporaphora协议达到38%,对依赖性分析达到11%。我们的分析表明,香草拉微调过简化了句的缩。但又会减轻了BERPR的等级损失。