Recent work has demonstrated that tuning continuous prompts on large, frozen pretrained language models (i.e., prefix tuning or P-tuning) can yield performance that is comparable or superior to fine-tuning. Nevertheless, the effectiveness of such methods under the context of data augmentation, which has been considered a common strategy to improve learning under low data regimes, has not be studied. In this paper, we examine several popular task-agnostic data augmentation techniques, i.e., EDA, Back Translation, and Mixup, when using prefix tuning under data scarcity. We show that data augmentation can be used to boost the performance of prefix tuning models, but the effectiveness of each technique varies and certain methods can lead to a notable degradation in performance, particularly when using larger models and on harder tasks. To help understand the above behaviour, we run experiments which reveal how prefix tuning generally presents a limited ability to separate the sentence embeddings from different classes of augmented data, and displays poorer performance on heavily altered data in particular. We also demonstrate that by adding a simple contrastive loss we can help mitigate such issues for prefix tuning, resulting in an improvement to augmented data performance.
翻译:最近的工作表明,对大型、冷冻的预先培训语言模型(即前缀调试或调试)的连续调试可以产生可比较或优于微调的性能,然而,在数据增强的背景下,这类方法的效力尚未研究,而数据增强被认为是在低数据制度下改进学习的一个共同战略。在本文件中,我们研究了几种流行的任务 -- -- 不可知性数据增强技术,即在数据稀缺的情况下使用前缀调试,即EDA、回转翻译和混合调试。我们表明,数据增强可以用来提高前缀调试模型的性能,但每种技术的效力各不相同,而且某些方法可以导致显著的性能退化,特别是在使用更大的模型和开展更艰巨的任务时。为了帮助理解上述行为,我们进行了一些实验,表明前缀调试一般而言,将句与不同类别的强化数据结合起来的能力是有限的,并且特别显示在严重变换的数据上表现较差。我们还表明,通过增加简单的反比性损失,我们可以帮助减轻这类问题,从而改进数据的性能。</s>