Despite pre-trained language models have proven useful for learning high-quality semantic representations, these models are still vulnerable to simple perturbations. Recent works aimed to improve the robustness of pre-trained models mainly focus on adversarial training from perturbed examples with similar semantics, neglecting the utilization of different or even opposite semantics. Different from the image processing field, the text is discrete and few word substitutions can cause significant semantic changes. To study the impact of semantics caused by small perturbations, we conduct a series of pilot experiments and surprisingly find that adversarial training is useless or even harmful for the model to detect these semantic changes. To address this problem, we propose Contrastive Learning with semantIc Negative Examples (CLINE), which constructs semantic negative examples unsupervised to improve the robustness under semantically adversarial attacking. By comparing with similar and opposite semantic examples, the model can effectively perceive the semantic changes caused by small perturbations. Empirical results show that our approach yields substantial improvements on a range of sentiment analysis, reasoning, and reading comprehension tasks. And CLINE also ensures the compactness within the same semantics and separability across different semantics in sentence-level.
翻译:尽管经过事先训练的语言模型已证明对学习高质量的语义表达方式有用,但这些模型仍然容易受到简单的扰动。最近,为提高预先训练模型的稳健性而开展的工作主要侧重于从与类似语义学相干的例子中进行对抗性培训,忽视使用不同或甚至相反的语义学。与图像处理领域不同,文字是互不相连的,几乎没有字词替代可导致重大的语义变化。为了研究由小扰动引起的语义变化的影响,我们进行了一系列试点实验,令人惊讶地发现,对抗性培训对于检测这些语义变化的模式来说是无用的,甚至有害。为了解决这一问题,我们建议与语义学负面实例进行对比性学习,这些实例建构出语义性的负面实例,而这种实例与图像处理领域不同,文字是互不相容的,在语义对抗性攻击中,文字替换很少能够带来重大的语义变化。通过与相反的语义学实例比较,模型可以有效地理解由小扰动造成的语义变化。直观的结果表明,我们的方法在情感分析、推理学和理解等分级任务中,也确保了精度和理解等分级任务的层次。